<Y

For Aaron:

import urllib2
from bs4 import BeautifulSoup

url = 'http://www.poemhunter.com/best-poems/william-stafford/thinking-for-berky/'
soup = BeautifulSoup(urllib2.urlopen(url))

print soup.find(attrs="title").text
for s in soup.find(attrs="poem").strings:
    print s.strip()

429 Too Many Requests: I don't like repeating what everyone else is saying on this weblog, and I don't have much to add to the general outpouring following the death of my friend Aaron, but I have to say something, because you can't say goodbye if you don't say anything. His death was awful, our loss great, his crimes (assuming any crime was committed at all) minor, and their prosecution farcical. I feel like a lot of what we're going through is our frustrated desire to see Aaron's case properly litigated, to see our friend vindicated, and I have no experience with that stuff, but I do have two personal stories to share. Two points where my life intersected with Aaron's in ways I haven't talked about publicly.

  1. Beautiful Soup was partly inspired by xmltramp, an XML parser Aaron wrote because he was frustrated with other XML parsers. I've been thinking a lot about this, and this is why my initial mourning of Aaron took the form it did, because screen-scraping—the use of an automated agent to replace a human-driven web browser—seems to have been at the core of the prosecutor's belief that this was a blockbuster case, more akin to a bank heist than a defaced storefront.
  2. In 2005 Aaron wanted me to join his startup, Infogami. He showed me a prototype, a NewsBruiser-like blogging site. I was looking to quit my job at CollabNet, but I didn't take Aaron's offer because I was comfortable in San Francisco and really didn't want to move across the country. (In a Twilight Zone-level twist, in early 2006 I'd end up moving to New York, which I now like a lot better than San Francisco.) Aaron tells the next part of the story here. He couldn't find a partner and eventually ended up merging Infogami with Reddit[0], which was then sold to Conde Nast in 2006.

    Back in 2005 there was enough of the college-era me left that I would have seen this outcome as a big missed opportunity. I still had some desire, left over from the dot-com era, to win the startup lottery. But of course the Reddit merger happened because Aaron couldn't get a partner for Infogami. And my life over the next couple years, including my secondhand reading of Aaron's experience at Reddit (he was fired soon after the Conde Nast acquisition) made it clear to me that I would not enjoy winning the startup lottery any more than Aaron did. I count this among the most important things Aaron taught me.

I've cut a lot of what I wrote here because I don't want this entry to be a bunch of stuff about me and my opinions and what I think. But I'm the one who's still here. Aaron is gone, and all that's left of him is the parts we can share.

[0] I can't let go of these little technical inconsistencies between what I'm seeing now and what I remember. It looks like during the merger, Infogami stopped being a blogging site, and its framework (which became the first Python version of Reddit) was renamed "Infogami". Or maybe "Infogami" was the framework all along, and the blogging site was only one application of the product; I don't know.

Crazy the Scorpion Semi-Online: Kirk and I collaborated on an in-browser version of Crazy the Scorpion for Klik of the Month Klub. It's "online" in the sense that you download an HTML file containing the game and play the game in your browser. But everyone who plays must be gathered around the same computer.

I scraped a bunch of Wikipedia page titles to make fake Trivial Pursuit cards. It's not great, but the whole thing's not bad for two hours of work. I mainly hope this version inspires you to play Crazy the Scorpion using physical components.

The Crummy.com Review Of Things 2012: I've been battered from all sides, and working all the time on RESTful Web APIs but I really feel like I need to get this out before the end of January, so I took some weekend time and finished it. First let's briefly review The Year in NYCB!

And now, our feature presentation. Of all the artifacts I experienced last year, these were my favorites.

Looking forward to 2013: man, we're already 1/12 of the way through 2013! This should have gone up a month ago! If I can finish RESTful Web APIs and Situation Normal I'll call it a good year.

Video Roundup: January 2013: Gonna put one of these up every month, so as to avoid the big bolus of reviews that happened last time. There are only three films here, all from a Paul Williams retrospective at the Museum of the Moving Image.

[Comments] (1) Spacewar! The Interview: Went to the museum last night not for a movie, but to meet Peter Samson and (via poor-quality videoconferencing) Steve Russell, for a conversation about the second video game ever made, Spacewar!.

I asked Russell the question that's been burning in my mind for years: why does Spacewar! have an exclamation mark in its name? His answer: "Once I got it working, I thought it deserved an exclamation point!" I also asked Russell if he considered any other names for the game. "Nope."

No one asked the obvious final question, so I got that one in too: what games are they playing now? Both Russell and Samson are fans of solitaire card games. Russell also said he likes the Android game Tiny Village.

Some other tidbits from the conversation, which I found especially interesting and/or which I don't think are on the net already:

Constellation Games Interview in Bookslut: Hey folks, CG fan Jeanne Thornton interviewed me a couple months back, creating a text that has now been published on Bookslut. (There's also an interview with Saladin Ahmed in the same issue.) The interview ranges over the CG publication process, games as an art form, space exploration, and so on.

One thing the published interview doesn't include is a question about Tetsuo Milk, which Jeanne cut before submitting the interview because it was kind of inside-baseball. But hey, inside baseball is the whole point of News You Can Bruise, so with Jeanne's permission I've reproduced the original question and my answer here:

I would feel remiss in not asking you about Tetsuo Milk, a character whom you’ve said (in your really, really mind-blowingly extensive commentary on the novel) essentially ran away with the book. Tetsuo is a brilliant character, but also feels at times like a heterogeneous element. I like this effect a lot, but I’m curious as to where this guy came from, what you’re saying through him, and how you see him fitting into the overall mix.

Maybe this will help: Tetsuo Milk is the ET version of Ariel. His silly mistakes and misunderstandings are mirror images of the mistakes Ariel makes trying to understand the Constellation. We don't laugh because we're not the ones being misunderstood. When Tetsuo does it to us, it's funny.

Here's a spoiler-free example. One of Ariel's post-contact hobbies is posting reviews of alien computer games to his blog. There's one really important scene that reverses the roles: Tetsuo writes a review of a game Ariel worked on as a developer, Brilhantes Poneis 5. Brilhantes is a stupid Farmville-type mobile game where you have a pet pony and do pointless tasks to earn coins to buy accessories for it. Tetsuo tackles the game from a post-scarcity Marxist perspective, putting a lot of work into understanding how a game's economy can work when the player is the employee of an animal. He gets a lot of it right (i.e. he recognizes that the game demeans both its players and its developers), but he's operating from completely the wrong framework.

That's the kind of mistake Ariel makes. He brings his human assumptions to everything, whether he realizes it or not, whether or not Tetsuo or someone else calls him on it.

(This is why there's a reference to "Tetsuo-like ideas" later in the interview; we shoulda cut that reference.)

Hire Aaron DeVore: I don't often use the NYCB bully pulpit to tell you to hire someone (apart from myself), but folks, you should hire Aaron DeVore. He was effectively the maintainer of Beautiful Soup during the period when I wasn't working on it. He answered tons of questions on the mailing list and sent me bugfix patches. When I started work on Beautiful Soup 4, he gave me a lot of feedback that helped stabilize the API.

Aaron did all this while a college student in Portland, Oregon. Now he's about to graduate, and he's looking for a job. Send him an email and let him know what you've got going on.

What's New in "RESTful Web APIs": We're ahead of schedule, which is good because we have a lot of work to do that isn't part of the book manuscript. Yesterday I sent out over forty copies of the manuscript to beta readers. That is too many beta readers, so at this point I must refuse anyone else who wants to be part of the beta, unless they have/had a hand in one of the standards we discuss, and they want to specifically critique our coverage of that standard.

With the beta closed I think it's a good time to go into a little detail about the structure of the book. My guiding principle was to write a book that will be as useful now as RESTful Web Services was in 2007. Like RWS, RESTful Web APIs has a main storyline that takes up most of the book. My inspiration for the main storyline were a few books that followed RWS, notably REST in Practice and Mike's Building Hypermedia APIs with HTML5 and Node.

RWS focused on the HTTP notion of a "resource", and despite the copious client-side code, this put the conceptual focus clearly on the server side, where the resource implementations live. RWA focuses on representations, and thus on hypermedia, on the interaction between client and server, which is where REST lives. The stuff you remember from RWS is still here, albeit rewritten in a pedagogically superior way. Web APIs work on the same principles as the Web, here's how HTTP works, here's what the Fielding constraints do, and so on. But the focus is always on the interaction, on the client and server manipulating each others' state by sending messages back and forth.

We've also benefited from a lot of tech work done by others. The IANA registry of link relations showed that state transitions don't have to be tied to a media type. The RFC that established that registry also showed how to define custom state transitions (extension relation types) without defining yet another media type to hold them.

Insights like these inform the new parts of RWA's main storyline. What makes your API different from every other RESTful API in existence? That's the only part you really need to buckle down and design. Everything else you can reuse, or at least copy.

In particular, you shouldn't have to design a custom media type. Your API probably isn't that different from other APIs, and a ton of hypermedia formats and protocols have been invented since 2007. We cover a few of the most promising ones in the book's main storyline. We cover even more of them afterwards, mostly in the big "Hypermedia Zoo" chapter. Here's the book-wide list:

After the main storyline and the hypermedia zoo, RWA continues the RWS tradition of giving an API-centric view of the HTTP standard. We have a "crash course in advanced HTTP" chapter, some of which is an update of Chapter 8 from RWS. (Look-before-you-leap requests never caught on, but I still feel like I have to describe them in RWA because I have no other source to refer you to!) Appendix A is an updated version of Appendix B from RWS, with the addition of these exciting new status codes:

Appendix B is an update of appendix C from RWS, with these API-licious new HTTP headers:

The amount of reused material in RWA is really small, because the main storyline is completely rewritten for 2013. And I haven't even mentioned our coverage of profiles, partly because I can't yet think of a way to talk about profiles at less length than what we say in the book.

Fundamental Indeed: I could spend all day just posting games that Board Game Dadaist comes up with. I forbear, for the sake of you, my readers, but Adam Parrish and I will email each other when we find an especially good one. And I think you should know about the best game BGD ever came up with (found by Adam back in December):

Fantasy Fundamental Rails (2005)

Players divide themselves into two teams.

Welcome BoingBoing Readers: If you're coming here from Cory Doctorow's review of Constellation Games, you might like to know about my web page for the book. The book was originally a serial, and I wrote chapter-by-chapter commentary as it was serialized. I also wrote four bonus stories set before, during, and after the novel, which I've released under CC-BY-SA. All that stuff is here.

You might also be interested to know that you get a DRM-free PDF version of the novel by buying direct from the publisher.

Now would also be a great time to mention that Constellation Games is eligible for this year's Hugo.

100 Years of Markov Chains: Back in January I took a little trip to Boston and hung out with Kirk. Among other things, we attended an event at Harvard celebrating the 100th anniversary of the paper that kicked off the Markov chain craze. I only wish Adam had been there. I've held off on talking about the event because I've been waiting for Harvard to put the video of the talks online. But that's a sucker's game, and now I have something better!

See, the first talk, by Brian Hayes, covered the amazing history leading up to the publication of Markov's seminal paper. He's now turned his talk into an article in American Scientist. The first few pages of that article are a basic introduction to Markov chains; the history starts on page four. Basically, Markov was a cranky old man who liked picking fights.

Markov’s pugnacity extended beyond mathematics to politics and public life. When the Russian church excommunicated Leo Tolstoy, Markov asked that he be expelled also. (The request was granted.) In 1902, the leftist writer Maxim Gorky was elected to the Academy, but the election was vetoed by Tsar Nicholas II. In protest, Markov announced that he would refuse all future honors from the tsar... In 1913, when the tsar called for celebrations of 300 years of Romanov rule, Markov responded by organizing a symposium commemorating a different anniversary: the publication of Ars Conjectandi 200 years before.

As acts of political protest go, the well-timed symposium is pretty great. At that symposium Markov revealed the Markov chain, which he'd invented as a way to smack down the dumb theological arguments of rival mathematician Pavel Nekrasov. His paper wasn't called "Markov Chains: Future Basis for Art and Scientific Discovery, Named After Me, A. A. Markov." It was called called "An Example of Statistical Investigation of the Text 'Eugene Onegin' Concerning the Connection of Samples in Chains".

Markov had manually gone through the first 20,000 characters of Pushkin's "Eugene Onegin", looking at every pair of letters, writing down whether the letters were both vowels, both consonants, vowel-consonant, or consonant-vowel. Then he'd modelled the transitions between those four states with a Markov chain. The result disproved an assumption about the law of large numbers, an assumption crucial to Nekrasov's mathematical argument for free will. There's something about this mindset that always gets me--inventing the sledgehammer so you can use it to kill a fly.

The other two talks were a lot more technical. I was mostly able to follow them, but I don't think I got much out of them. Here's a summary of all three talks from someone else who was there. But I strongly recommend Hayes's article to anyone who reads this weblog.

[Comments] (1) Ragtime Synchronicity:

"Bugs," said Krakowski. "In-tell-i-gence gathering devices. The Constellation loves recording things. Now they're going to record every conversation anyone ever has."

"I think you might be projecting a little."

[Comments] (1) February Film Roundup: The second in the 2013 series, as promised. Note: I draw no distinction between information about a movie that's a "spoiler" and information that's not.

: From an interview with Ken Liu, recent Hugo/Nebula/WFA winner:

I went to law school, started a new job, and kind of gave up on writing for a while due to a supreme act of stupidity. I wrote this one story that I really loved, but no one would buy it. Instead of writing more stories and subbing them, as those wiser than I was would have told me, I obsessively revised it and sent it back out, over and over, until I eventually gave up, concluding that I was never going to be published again.

And then, in 2009, Sumana Harihareswara and Leonard Richardson bought that story, "Single-Bit Error," for their anthology, Thoughtcrime Experiments. The premise of the anthology was, in the editors' words, "to find mind-breakingly good science fiction/fantasy stories that other editors had rejected, and release them into the commons for readers to enjoy."

I can't tell you how much that sale meant to me. The fact that someone liked that story after years of rejections made me realize that I just had to find the one editor, the one reader who got my story, and it was enough. Instead of trying to divine what some mythical ur-editor or "the market" wanted, I felt free, after that experience, to just try to tell stories that I wanted to see told and not worry so much about selling or not selling. I got back into writing—and amazingly, my stories began to sell.

Case closed, I'd say.

[Comments] (2) March Film Roundup: Okay, look. I don't see movies just for their entertainment value. I dig film as an art form. But my permit to dig is premised on an amateur understanding of film as a narrative art form. If you want to present an endless stream of disconnected images, let's do an installation piece, because I want to decide for myself when I've had enough. I'm not going to be your captive for fifty minutes. (I'm looking at you, Andy Warhol.) And all that aside, I'm not gonna see a movie called Trash Humpers (2009), when the nicest thing the folks doing the screening can say is that it "rewards the open-minded viewer with moments of astonishing and unexpected poignancy."

Which is to say that I skipped most of the museum's highly avant-garde March offerings. I also got this book I have to work on. So not many movies in this roundup. Let's-a go:

In Search of the Beautiful Soup Double-Dippers: Recently I noticed that certain IPs were using distribute or setuptools to download the Beautiful Soup tarball multiple times in a row. For one thing, I'm not sure why distribute and setuptools are downloading Beautiful Soup from crummy.com instead of using PyPI, especially since PyPI registers almost 150k downloads of the latest BS4--why are some people using PyPI and not others?

If anyone knows how to convince everyone to use PyPI, I'd appreciate the knowledge. But it's not a big deal right now, and it gives me some visibility into how people are using Beautiful Soup. Visibility which I will share with you.

Yesterday, the 17th, the Beautiful Soup 4.1.3 tarball was downloaded 2223 times. It is by far the most popular thing on crummy.com. The second most popular thing is the Beautiful Soup 3.2.1 tarball, which was downloaded 381 times. The vast majority of the downloads were from installation scripts: distribute or setuptools.

1516 distinct IP addresses were responsible for the 2223 downloads of 4.1.3. I wrote a script to find out how many IP addresses downloaded Beautiful Soup more than once. The results:

Downloads from a single IP Number of times this happened
551
351
151
131
111
52
412
343
2453
11001

Naturally my attention was drawn to the outliers at the top of the table. I investigated them individually. The IP address responsible for 55 downloads is a software company of the sort that might be deploying to a bunch of computers behind a proxy. The 35 is an individual on a cable modem who, judging from their other traces on the Internet, is deploying to a bunch of computers using Puppet. The 15, the 13, and the 11 are all from Travis CI, a continuous integration service.

One of the two 5s was an Amazon EC2 instance. Five of the twelve 4s were Amazon EC2 instances. Thirty-seven of the forty-three 3s were Amazon EC2 instances. And 395 of the 453 double-dippers were Amazon EC2 instances. Something's clearly going on with EC2. (There was also one download from within Amazon corporate, among other BigCo downloaders.)

I hypothesized that the overall majority of duplicate requests are from Amazon EC2 instances being wiped and redeployed. To test this hypothesis I went through all the double-dippers and calculated the time between the first request and the second. My results are in this scatter plot. Each point on the plot represents an IP address that downloaded Beautiful Soup twice yesterday.

For EC2 instances, the median time between requests is 11 hours and 45 minutes. So EC2 instances are being automatically redeployed twice a day. For non-EC2 instances, the median time between requests is 51 minutes, and the modal time is about zero. Those people set up a dev environment, discover that something doesn't work, and try it again from scratch.

Board Game Dadaist Improvements: I've finally relented to Adam's demands and made some improvements to the Board Game Dadaist RSS feed. He broke his kneecap recently and I figured this would be a good way to cheer him up. Every game that shows up in the feed now has a permalink (here's "Plue"), and that page has a very basic link for posting your find to Twitter.

[Comments] (1) April Film Roundup: Another month, another few movies. RESTful Web APIs is almost done, but not quite, so once again there's not a whole lot here. The theme of this month is "really loving a movie, seeing a different movie on that basis, and being very disappointed."

Story Bundle: Constellation Games is featured in the current video game-themed StoryBundle. It's a pay-what-you-want, like the Humble Indie Bundle. This means that if you're the ultimate cheapskate, you can get my book and six others for the Steam-sale-level price of three bucks. Pay ten bucks, and you also get three bonus books, including Jordan Mechner's "The Making of Prince of Persia and a Ralph Baer memoir which--just guessing here--is probably enjoyably cranky.

And for people who discover Constellation Games based on this bundle, this is my occasional notification that there are tons of free extras: four bonus stories, in-character Twitter feeds, and an episode guide with commentary.

Side note: the bundle was assembled by Simon Carless, who is the reason I wrote Constellation Games in the first place.

[No comments] Beautiful Soup 4.2.0: My work on RESTful Web APIs is pretty much done, so I went through the Beautiful Soup bug tracker and fixed everything I could. The result is a new, stoner-iffic release of Beautiful Soup.

Here are the release notes. The main new features are a much more capable CSS selector engine, and a diagnostics module that should help with tech support.

<Y

[Main]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.