For Aaron:

import urllib2
from bs4 import BeautifulSoup

url = 'http://www.poemhunter.com/best-poems/william-stafford/thinking-for-berky/'
soup = BeautifulSoup(urllib2.urlopen(url))

print soup.find(attrs="title").text
for s in soup.find(attrs="poem").strings:
    print s.strip()

429 Too Many Requests: I don't like repeating what everyone else is saying on this weblog, and I don't have much to add to the general outpouring following the death of my friend Aaron, but I have to say something, because you can't say goodbye if you don't say anything. His death was awful, our loss great, his crimes (assuming any crime was committed at all) minor, and their prosecution farcical. I feel like a lot of what we're going through is our frustrated desire to see Aaron's case properly litigated, to see our friend vindicated, and I have no experience with that stuff, but I do have two personal stories to share. Two points where my life intersected with Aaron's in ways I haven't talked about publicly.

  1. Beautiful Soup was partly inspired by xmltramp, an XML parser Aaron wrote because he was frustrated with other XML parsers. I've been thinking a lot about this, and this is why my initial mourning of Aaron took the form it did, because screen-scraping—the use of an automated agent to replace a human-driven web browser—seems to have been at the core of the prosecutor's belief that this was a blockbuster case, more akin to a bank heist than a defaced storefront.
  2. In 2005 Aaron wanted me to join his startup, Infogami. He showed me a prototype, a NewsBruiser-like blogging site. I was looking to quit my job at CollabNet, but I didn't take Aaron's offer because I was comfortable in San Francisco and really didn't want to move across the country. (In a Twilight Zone-level twist, in early 2006 I'd end up moving to New York, which I now like a lot better than San Francisco.) Aaron tells the next part of the story here. He couldn't find a partner and eventually ended up merging Infogami with Reddit[0], which was then sold to Conde Nast in 2006.

    Back in 2005 there was enough of the college-era me left that I would have seen this outcome as a big missed opportunity. I still had some desire, left over from the dot-com era, to win the startup lottery. But of course the Reddit merger happened because Aaron couldn't get a partner for Infogami. And my life over the next couple years, including my secondhand reading of Aaron's experience at Reddit (he was fired soon after the Conde Nast acquisition) made it clear to me that I would not enjoy winning the startup lottery any more than Aaron did. I count this among the most important things Aaron taught me.

I've cut a lot of what I wrote here because I don't want this entry to be a bunch of stuff about me and my opinions and what I think. But I'm the one who's still here. Aaron is gone, and all that's left of him is the parts we can share.

[0] I can't let go of these little technical inconsistencies between what I'm seeing now and what I remember. It looks like during the merger, Infogami stopped being a blogging site, and its framework (which became the first Python version of Reddit) was renamed "Infogami". Or maybe "Infogami" was the framework all along, and the blogging site was only one application of the product; I don't know.

Crazy the Scorpion Semi-Online: Kirk and I collaborated on an in-browser version of Crazy the Scorpion for Klik of the Month Klub. It's "online" in the sense that you download an HTML file containing the game and play the game in your browser. But everyone who plays must be gathered around the same computer.

I scraped a bunch of Wikipedia page titles to make fake Trivial Pursuit cards. It's not great, but the whole thing's not bad for two hours of work. I mainly hope this version inspires you to play Crazy the Scorpion using physical components.

The Crummy.com Review Of Things 2012: I've been battered from all sides, and working all the time on RESTful Web APIs but I really feel like I need to get this out before the end of January, so I took some weekend time and finished it. First let's briefly review The Year in NYCB!

And now, our feature presentation. Of all the artifacts I experienced last year, these were my favorites.

Looking forward to 2013: man, we're already 1/12 of the way through 2013! This should have gone up a month ago! If I can finish RESTful Web APIs and Situation Normal I'll call it a good year.

Video Roundup: January 2013: Gonna put one of these up every month, so as to avoid the big bolus of reviews that happened last time. There are only three films here, all from a Paul Williams retrospective at the Museum of the Moving Image.

[Comments] (1) Spacewar! The Interview: Went to the museum last night not for a movie, but to meet Peter Samson and (via poor-quality videoconferencing) Steve Russell, for a conversation about the second video game ever made, Spacewar!.

I asked Russell the question that's been burning in my mind for years: why does Spacewar! have an exclamation mark in its name? His answer: "Once I got it working, I thought it deserved an exclamation point!" I also asked Russell if he considered any other names for the game. "Nope."

No one asked the obvious final question, so I got that one in too: what games are they playing now? Both Russell and Samson are fans of solitaire card games. Russell also said he likes the Android game Tiny Village.

Some other tidbits from the conversation, which I found especially interesting and/or which I don't think are on the net already:

Constellation Games Interview in Bookslut: Hey folks, CG fan Jeanne Thornton interviewed me a couple months back, creating a text that has now been published on Bookslut. (There's also an interview with Saladin Ahmed in the same issue.) The interview ranges over the CG publication process, games as an art form, space exploration, and so on.

One thing the published interview doesn't include is a question about Tetsuo Milk, which Jeanne cut before submitting the interview because it was kind of inside-baseball. But hey, inside baseball is the whole point of News You Can Bruise, so with Jeanne's permission I've reproduced the original question and my answer here:

I would feel remiss in not asking you about Tetsuo Milk, a character whom you’ve said (in your really, really mind-blowingly extensive commentary on the novel) essentially ran away with the book. Tetsuo is a brilliant character, but also feels at times like a heterogeneous element. I like this effect a lot, but I’m curious as to where this guy came from, what you’re saying through him, and how you see him fitting into the overall mix.

Maybe this will help: Tetsuo Milk is the ET version of Ariel. His silly mistakes and misunderstandings are mirror images of the mistakes Ariel makes trying to understand the Constellation. We don't laugh because we're not the ones being misunderstood. When Tetsuo does it to us, it's funny.

Here's a spoiler-free example. One of Ariel's post-contact hobbies is posting reviews of alien computer games to his blog. There's one really important scene that reverses the roles: Tetsuo writes a review of a game Ariel worked on as a developer, Brilhantes Poneis 5. Brilhantes is a stupid Farmville-type mobile game where you have a pet pony and do pointless tasks to earn coins to buy accessories for it. Tetsuo tackles the game from a post-scarcity Marxist perspective, putting a lot of work into understanding how a game's economy can work when the player is the employee of an animal. He gets a lot of it right (i.e. he recognizes that the game demeans both its players and its developers), but he's operating from completely the wrong framework.

That's the kind of mistake Ariel makes. He brings his human assumptions to everything, whether he realizes it or not, whether or not Tetsuo or someone else calls him on it.

(This is why there's a reference to "Tetsuo-like ideas" later in the interview; we shoulda cut that reference.)

Hire Aaron DeVore: I don't often use the NYCB bully pulpit to tell you to hire someone (apart from myself), but folks, you should hire Aaron DeVore. He was effectively the maintainer of Beautiful Soup during the period when I wasn't working on it. He answered tons of questions on the mailing list and sent me bugfix patches. When I started work on Beautiful Soup 4, he gave me a lot of feedback that helped stabilize the API.

Aaron did all this while a college student in Portland, Oregon. Now he's about to graduate, and he's looking for a job. Send him an email and let him know what you've got going on.

What's New in "RESTful Web APIs": We're ahead of schedule, which is good because we have a lot of work to do that isn't part of the book manuscript. Yesterday I sent out over forty copies of the manuscript to beta readers. That is too many beta readers, so at this point I must refuse anyone else who wants to be part of the beta, unless they have/had a hand in one of the standards we discuss, and they want to specifically critique our coverage of that standard.

With the beta closed I think it's a good time to go into a little detail about the structure of the book. My guiding principle was to write a book that will be as useful now as RESTful Web Services was in 2007. Like RWS, RESTful Web APIs has a main storyline that takes up most of the book. My inspiration for the main storyline were a few books that followed RWS, notably REST in Practice and Mike's Building Hypermedia APIs with HTML5 and Node.

RWS focused on the HTTP notion of a "resource", and despite the copious client-side code, this put the conceptual focus clearly on the server side, where the resource implementations live. RWA focuses on representations, and thus on hypermedia, on the interaction between client and server, which is where REST lives. The stuff you remember from RWS is still here, albeit rewritten in a pedagogically superior way. Web APIs work on the same principles as the Web, here's how HTTP works, here's what the Fielding constraints do, and so on. But the focus is always on the interaction, on the client and server manipulating each others' state by sending messages back and forth.

We've also benefited from a lot of tech work done by others. The IANA registry of link relations showed that state transitions don't have to be tied to a media type. The RFC that established that registry also showed how to define custom state transitions (extension relation types) without defining yet another media type to hold them.

Insights like these inform the new parts of RWA's main storyline. What makes your API different from every other RESTful API in existence? That's the only part you really need to buckle down and design. Everything else you can reuse, or at least copy.

In particular, you shouldn't have to design a custom media type. Your API probably isn't that different from other APIs, and a ton of hypermedia formats and protocols have been invented since 2007. We cover a few of the most promising ones in the book's main storyline. We cover even more of them afterwards, mostly in the big "Hypermedia Zoo" chapter. Here's the book-wide list:

After the main storyline and the hypermedia zoo, RWA continues the RWS tradition of giving an API-centric view of the HTTP standard. We have a "crash course in advanced HTTP" chapter, some of which is an update of Chapter 8 from RWS. (Look-before-you-leap requests never caught on, but I still feel like I have to describe them in RWA because I have no other source to refer you to!) Appendix A is an updated version of Appendix B from RWS, with the addition of these exciting new status codes:

Appendix B is an update of appendix C from RWS, with these API-licious new HTTP headers:

The amount of reused material in RWA is really small, because the main storyline is completely rewritten for 2013. And I haven't even mentioned our coverage of profiles, partly because I can't yet think of a way to talk about profiles at less length than what we say in the book.

Fundamental Indeed: I could spend all day just posting games that Board Game Dadaist comes up with. I forbear, for the sake of you, my readers, but Adam Parrish and I will email each other when we find an especially good one. And I think you should know about the best game BGD ever came up with (found by Adam back in December):

Fantasy Fundamental Rails (2005)

Players divide themselves into two teams.

Welcome BoingBoing Readers: If you're coming here from Cory Doctorow's review of Constellation Games, you might like to know about my web page for the book. The book was originally a serial, and I wrote chapter-by-chapter commentary as it was serialized. I also wrote four bonus stories set before, during, and after the novel, which I've released under CC-BY-SA. All that stuff is here.

You might also be interested to know that you get a DRM-free PDF version of the novel by buying direct from the publisher.

Now would also be a great time to mention that Constellation Games is eligible for this year's Hugo.

100 Years of Markov Chains: Back in January I took a little trip to Boston and hung out with Kirk. Among other things, we attended an event at Harvard celebrating the 100th anniversary of the paper that kicked off the Markov chain craze. I only wish Adam had been there. I've held off on talking about the event because I've been waiting for Harvard to put the video of the talks online. But that's a sucker's game, and now I have something better!

See, the first talk, by Brian Hayes, covered the amazing history leading up to the publication of Markov's seminal paper. He's now turned his talk into an article in American Scientist. The first few pages of that article are a basic introduction to Markov chains; the history starts on page four. Basically, Markov was a cranky old man who liked picking fights.

Markov’s pugnacity extended beyond mathematics to politics and public life. When the Russian church excommunicated Leo Tolstoy, Markov asked that he be expelled also. (The request was granted.) In 1902, the leftist writer Maxim Gorky was elected to the Academy, but the election was vetoed by Tsar Nicholas II. In protest, Markov announced that he would refuse all future honors from the tsar... In 1913, when the tsar called for celebrations of 300 years of Romanov rule, Markov responded by organizing a symposium commemorating a different anniversary: the publication of Ars Conjectandi 200 years before.

As acts of political protest go, the well-timed symposium is pretty great. At that symposium Markov revealed the Markov chain, which he'd invented as a way to smack down the dumb theological arguments of rival mathematician Pavel Nekrasov. His paper wasn't called "Markov Chains: Future Basis for Art and Scientific Discovery, Named After Me, A. A. Markov." It was called called "An Example of Statistical Investigation of the Text 'Eugene Onegin' Concerning the Connection of Samples in Chains".

Markov had manually gone through the first 20,000 characters of Pushkin's "Eugene Onegin", looking at every pair of letters, writing down whether the letters were both vowels, both consonants, vowel-consonant, or consonant-vowel. Then he'd modelled the transitions between those four states with a Markov chain. The result disproved an assumption about the law of large numbers, an assumption crucial to Nekrasov's mathematical argument for free will. There's something about this mindset that always gets me--inventing the sledgehammer so you can use it to kill a fly.

The other two talks were a lot more technical. I was mostly able to follow them, but I don't think I got much out of them. Here's a summary of all three talks from someone else who was there. But I strongly recommend Hayes's article to anyone who reads this weblog.

[Comments] (1) Ragtime Synchronicity:

"Bugs," said Krakowski. "In-tell-i-gence gathering devices. The Constellation loves recording things. Now they're going to record every conversation anyone ever has."

"I think you might be projecting a little."

[Comments] (1) February Film Roundup: The second in the 2013 series, as promised. Note: I draw no distinction between information about a movie that's a "spoiler" and information that's not.

: From an interview with Ken Liu, recent Hugo/Nebula/WFA winner:

I went to law school, started a new job, and kind of gave up on writing for a while due to a supreme act of stupidity. I wrote this one story that I really loved, but no one would buy it. Instead of writing more stories and subbing them, as those wiser than I was would have told me, I obsessively revised it and sent it back out, over and over, until I eventually gave up, concluding that I was never going to be published again.

And then, in 2009, Sumana Harihareswara and Leonard Richardson bought that story, "Single-Bit Error," for their anthology, Thoughtcrime Experiments. The premise of the anthology was, in the editors' words, "to find mind-breakingly good science fiction/fantasy stories that other editors had rejected, and release them into the commons for readers to enjoy."

I can't tell you how much that sale meant to me. The fact that someone liked that story after years of rejections made me realize that I just had to find the one editor, the one reader who got my story, and it was enough. Instead of trying to divine what some mythical ur-editor or "the market" wanted, I felt free, after that experience, to just try to tell stories that I wanted to see told and not worry so much about selling or not selling. I got back into writing—and amazingly, my stories began to sell.

Case closed, I'd say.

[Comments] (2) March Film Roundup: Okay, look. I don't see movies just for their entertainment value. I dig film as an art form. But my permit to dig is premised on an amateur understanding of film as a narrative art form. If you want to present an endless stream of disconnected images, let's do an installation piece, because I want to decide for myself when I've had enough. I'm not going to be your captive for fifty minutes. (I'm looking at you, Andy Warhol.) And all that aside, I'm not gonna see a movie called Trash Humpers (2009), when the nicest thing the folks doing the screening can say is that it "rewards the open-minded viewer with moments of astonishing and unexpected poignancy."

Which is to say that I skipped most of the museum's highly avant-garde March offerings. I also got this book I have to work on. So not many movies in this roundup. Let's-a go:

In Search of the Beautiful Soup Double-Dippers: Recently I noticed that certain IPs were using distribute or setuptools to download the Beautiful Soup tarball multiple times in a row. For one thing, I'm not sure why distribute and setuptools are downloading Beautiful Soup from crummy.com instead of using PyPI, especially since PyPI registers almost 150k downloads of the latest BS4--why are some people using PyPI and not others?

If anyone knows how to convince everyone to use PyPI, I'd appreciate the knowledge. But it's not a big deal right now, and it gives me some visibility into how people are using Beautiful Soup. Visibility which I will share with you.

Yesterday, the 17th, the Beautiful Soup 4.1.3 tarball was downloaded 2223 times. It is by far the most popular thing on crummy.com. The second most popular thing is the Beautiful Soup 3.2.1 tarball, which was downloaded 381 times. The vast majority of the downloads were from installation scripts: distribute or setuptools.

1516 distinct IP addresses were responsible for the 2223 downloads of 4.1.3. I wrote a script to find out how many IP addresses downloaded Beautiful Soup more than once. The results:

Downloads from a single IP Number of times this happened

Naturally my attention was drawn to the outliers at the top of the table. I investigated them individually. The IP address responsible for 55 downloads is a software company of the sort that might be deploying to a bunch of computers behind a proxy. The 35 is an individual on a cable modem who, judging from their other traces on the Internet, is deploying to a bunch of computers using Puppet. The 15, the 13, and the 11 are all from Travis CI, a continuous integration service.

One of the two 5s was an Amazon EC2 instance. Five of the twelve 4s were Amazon EC2 instances. Thirty-seven of the forty-three 3s were Amazon EC2 instances. And 395 of the 453 double-dippers were Amazon EC2 instances. Something's clearly going on with EC2. (There was also one download from within Amazon corporate, among other BigCo downloaders.)

I hypothesized that the overall majority of duplicate requests are from Amazon EC2 instances being wiped and redeployed. To test this hypothesis I went through all the double-dippers and calculated the time between the first request and the second. My results are in this scatter plot. Each point on the plot represents an IP address that downloaded Beautiful Soup twice yesterday.

For EC2 instances, the median time between requests is 11 hours and 45 minutes. So EC2 instances are being automatically redeployed twice a day. For non-EC2 instances, the median time between requests is 51 minutes, and the modal time is about zero. Those people set up a dev environment, discover that something doesn't work, and try it again from scratch.

Board Game Dadaist Improvements: I've finally relented to Adam's demands and made some improvements to the Board Game Dadaist RSS feed. He broke his kneecap recently and I figured this would be a good way to cheer him up. Every game that shows up in the feed now has a permalink (here's "Plue"), and that page has a very basic link for posting your find to Twitter.

[Comments] (1) April Film Roundup: Another month, another few movies. RESTful Web APIs is almost done, but not quite, so once again there's not a whole lot here. The theme of this month is "really loving a movie, seeing a different movie on that basis, and being very disappointed."

Story Bundle: Constellation Games is featured in the current video game-themed StoryBundle. It's a pay-what-you-want, like the Humble Indie Bundle. This means that if you're the ultimate cheapskate, you can get my book and six others for the Steam-sale-level price of three bucks. Pay ten bucks, and you also get three bonus books, including Jordan Mechner's "The Making of Prince of Persia and a Ralph Baer memoir which--just guessing here--is probably enjoyably cranky.

And for people who discover Constellation Games based on this bundle, this is my occasional notification that there are tons of free extras: four bonus stories, in-character Twitter feeds, and an episode guide with commentary.

Side note: the bundle was assembled by Simon Carless, who is the reason I wrote Constellation Games in the first place.

Beautiful Soup 4.2.0: My work on RESTful Web APIs is pretty much done, so I went through the Beautiful Soup bug tracker and fixed everything I could. The result is a new, stoner-iffic release of Beautiful Soup.

Here are the release notes. The main new features are a much more capable CSS selector engine, and a diagnostics module that should help with tech support.

[Comments] (2) May Film Roundup: Lots of travel in May, so not many movies this month either. But I do have heterodox opinions for you. Read on!

[Comments] (1) Sycorax Transcends Your Puny Version Numbers: Last night I'd finally had enough with all of my Twitter bots not working due to sending POST requests to a resource that was 401 Gone. The one I really need to keep going is Frances Daily, and that one's on break right now because the planner page for June 1988 was unfortunately missing. But we're running out of June, so I fixed it.

To do that I had to fix Sycorax, the way-too-advanced piece of software that enacted an elaborate running commentary during the serialization of Constellation Games, a commentary that about eighty people saw. Since I'm pretty sure I'm the only person using Sycorax, I've decided to stop doing a tarballed release every time I change something, and just put the code up on Github.

Robot roll call!

The robot in the shop is @RoyPostcards, which I'll fix around the same time I get some more postcards ready to put up.

Update: @CrowdBoardGames prayed for a friend, and he came! His name is Timmy!

The Interesting Parts: I've wanted to write this post for a long time, so long that the main guy I'm writing about, Iain M. Banks, announced that he had cancer and then died of the cancer. That doesn't really affect what I'm going to write, but it does give it an air of speaking ill of the dead, and it just sucks in general.

The Banks novel I read most recently was The Algebraist, and it was a mixed bag for me. The epic scope of Banks's imagination has always been a big inspiration to me as a writer, but The Algebraist is dominated by Banks's "normal human" characters, who channel that epic scope into activities I have always found really boring. I mentioned this in my commentary for "The Time Somn Died", and I assumed it was a side effect of the fact that there's just nothing to do in the Culture. But The Algebraist isn't a Culture novel. Its "normal human" characters don't sit around all day being post-scarcity. I'm just not interested in most of what they do.

Fortunately, eventually the spectacle and the aliens spin up and save the book. I speak mainly of the Dwellers, aliens who make their way onto my list of SF favorites for the way they combine Bertie Wooster joie de vivre with a complete disregard for the value of individual lives, including their own. Great stuff. I loved it. Colonel Hatherance: another awesome alien.

(The other flaw in The Algebraist is one I am perhaps too quick to notice in other writers. There's a puzzle, and a solution to the puzzle, but no explanation as to how the solution--which, by necessity, can be explained in a few paragraphs to a reader who's only been immersed in the universe for a few hours--has evaded all the in-universe people who've been desperately trying to solve the puzzle for thousands of years. That has nothing to do with this post, but I thought I'd mention it because it's a tricky problem, and if you start looking for it you'll see it a lot.)

After Iain M. Banks's The Algebraist, I naturally turned my reading eye to A History of Engineering and Science in the Bell System: Communication Sciences (1925-1980), by Iain Banks. No, just kidding. It's a corporate history published by Bell Labs in 1984 to keep track of all the stuff they'd invented over the years. My copy used to be in the library of the Union Presbyterian Seminary in Virginia--not sure why they had a copy in the first place.

You might think these two books have nothing in common, but one commonality was clear as soon as I cracked the latter tome: they both start out super boring.

Unlike The Algebraist, A History of Engineering and Science... is boring most of the way through. There's a lot about switched telephone networks, radio and fiber-optic cables. That's actually why I got this book; I wanted to do research for an alt-history story about phone phreaking. But the details were so dry I'm either gonna give up on the idea or just read Exploding the Phone instead. Here are the interesting parts of that book:

Hopefully you see the problem. Nothing about those quotes, or about stereograms or classifying two-person relationships, is intrinsically more interesting than the stuff about circuit-switched telephone networks. It's all subjective. When I reached Chapter 9 of A History of Engineering and Science... I had the feeling of encountering the Dwellers in The Algebraist. "At last, this is my chapter!" But there's obviously an audience for that earlier stuff. I'm just not it. So... there must be people who really enjoy the human-centric parts of The Algebraist, right? Perhaps those people even enjoy the human-centric parts of the Culture books?

What madness is this? I knew that interestingness was subjective for nonfiction. As the author of a novel about alien video games, I am familiar with the idea that a reader might decide a novel is just not their thing. But it had escaped me that the same logic might apply within a novel. This made me re-evaluate the parts of Banks I don't like. However, I came to the same conclusion: I still don't like them, and I'm gonna try to avoid writing that sort of stuff. But it's not so strange anymore that there'd be an audience for it.

Last week I went to the Met and checked out an exhibit of prints from the Civil War. There's a Thomas Nast print from Harper's called "Christmas Eve", divided into two halves: the woman at home with the kids on Christmas Eve, praying for her husband, and the soldier in camp looking at a picture of his wife. It's a moving piece but at first glance there's not much to distinguish it from other "war sucks" pieces of the time.

But if you look in the left and right corners of the print, you'll see Santa Claus with his reindeer. On the left he's climbing down the chimney, and on the right he's driving through the camp, tossing out gifts. The cover for the same issue of Harper's shows Santa giving toys and socks to Union soldiers. These prints are the origin of the modern image of Santa Claus.

Nast first drew Santa Claus for the 1862 Christmas season Harper’s Weekly cover and center-fold illustration to memorialize the family sacrifices of the Union during the early and, for the north, darkest days of the Civil War.

Which are the interesting parts? How do you tell?

June Film Roundup: I guess the theme of June was mixing fact and fiction? I dunno why I feel the need to come up with a theme for all the random movies I watched in a month. This thing is long enough as it is. Here you go:

July is gonna be a huge month at the museum, as their theme for the month is "The American Epic". Movies that might show up in next month's review include Citizen Kane, The Grapes of Wrath, Do The Right Thing, Reds, The Right Stuff, Nashville, There Will Be Blood, and The Night of the Hunter. I'm tired already!

Mashteroids: As my birthday present to you, I present Mashteroids, Queneau assemblies of the IAU citations for minor planets. This showed up briefly on NYCB two years ago, but I've expanded the dataset, improved the sentence tokenization, and created a platform for future Queneaux.

A few samples:


Robert Shelton (b. 1948), nineteenth president of the University of Arizona, chaired the Keck Telescope Board from 1997 to 2000. The book promoted the Copernican system and became a best seller. Besides his scientific work, he is also the author of the well-known popularizations A Brief History of Time and Black holes and Baby Universes and Other Essays.


Named for the province of New Zealand on the eastern side of the South Island. He published his first story in Pilote magazine in 1972 and his first album in 1975. He has written several papers on the history of optics.


Junttura embodies the Finnish mentality to get things done, stubbornly and at all costs. He is also an authority on the poet and novelist Kenji Miyazawa and currently directs the museum at the Kenji Miyazawa Iihatobu Center. "Miminko" is Czech word that expresses the unique stage of innocence at the beginning of human life.

You also got an RSS feed.

[Comments] (4) Billy Collins, Stand-Up Comic (Bonus: How To Write Poetry): For reasons that need not concern us, I recently gave some advice on writing poetry. I don't know anything about poetry, but I was able to derive the most basic advice from first principles: "read a whole bunch of poetry before you try to write some." Adam Parrish knows more about poetry and offered some poetry-specific advice: "get over yourself".

I think a lot of incipient poets get caught in the idea that poetry is somehow about free self expression, and that the best poetry is that which most freely expresses the self—which, of course, isn't true. Poetry is a genre that you have to be literate in and a toolbox that you have to learn how to use.

If reading a bunch of poetry is too much work for you, you should at least take the time to reverse-engineer the findings of this paper by Michael Coleman (also via Adam), which uses machine learning to model the differences between poems written by members of the Academy of American Poets, and poems written by the general public. It gives some clues as to how the genre works and what's in the toolbox. e.g.:

The negative association with the PYMCP variable ‘Rhy’—a proxy for the extent to which words elicit other words that rhyme with the stimulus word—indicates that professional poets use words that are somewhat unusual but not necessarily complex. Professional poems have fewer words denoting affect but more words denoting number. Professional poems also refer less to the present and to time in general than amateur poems.

Run your stuff through Poetry Assessor until you start getting good scores. Now you're a poet! Well, sort of. The machine-learning algorithm can reliably tag crappy poems as crap, but it mainly looks at vocabulary and I don't think it knows about scansion at all. I ran the first paragraph of Bleak House—three ponderous Victorian sentences—through Poetry Assessor and it got a 1.8, making it a decent twentieth-century American poem. (And it's a very good paragraph, but you see the problem.)

I formulated my "read a lot of poetry" advice because that was also the techinque I used to figure out if I had any more specific advice to give. (I don't.) While reading a lot of poetry, I got really into the work of former US Poet Laureate Billy Collins. Collins has written a number of what I would call "NPR poems", poems that you could imagine him reading aloud on NPR, some of which he probably did read aloud on NPR. He's on NPR a lot. And at first glance the NPR poems have more in common with stand-up comedy than traditional or contemporary poetry.

I think it's best to think of the narrator of a Billy Collins poem as a fictional poet named "Billy Collins", a man whose bouts of incompetence and perpetual lack of inspiration are exploited by the real-world Billy Collins. Stand-up comics do the same thing. I became very interested in how Collins is able to use this persona to do serious poetic work through poems that aren't serious at all—again, something analogous to what a good stand-up comic does.

Some examples. I'm gonna start with Cheerios and Litany, two poems I don't really like. These poems are about as confrontational as Billy Collins gets, but it's not because of their subject matter: it's because they're poetry hacks.

"Cheerios" has a Poetry Assessor score of 0.8--barely professional quality. In "Cheerios" the incompetent poet "Billy Collins" keeps trying to launch a flight of poetic fancy using the overwrought abstract language associated with amateur poetry: "stooped and threadbare back", "more noble and enduring are the hills". But he can never get it off the ground because the engine keeps stalling on concrete imagery--the objective correlatives associated with professional poetry. The problem with that is the concrete imagery consists of nothing but different breakfast foods ("waited for my eggs and toast", "that dude's older than Cheerios", "illuminated my orange juice"). So it's deliberately bad amateur poetry interrupted by deliberately bad professional poetry. Just saying it's a bad poem isn't enough. It's bad in a very interesting, bathetic way.

On the other hand, "Litany" has the incredibly high Poetry Assessor score of 4.4. (The maximum score given in the Coleman paper corresponds to a PA score of 5.2.) What's his secret? Collins spends the entire poem blasting out objective correlatives at high speed. Some of them are taken directly from other poems ("the crystal goblet and the wine"), some of them are allusions ("the plums on the counter", "the burning wheel of the sun"), some are original ("the boat asleep in its boathouse"). But as he shoots those images out, he classifies them, like he's working on an assembly line, or brainstorming the poem he will eventually write. "Litany" is the opposite of "Cheerios". Collins is hacking the part of your brain that evaluates poetry, pushing all your buttons with free-floating imagery. It's a bad poem because you don't know enough about the people in the poem to understand what the imagery means.

Some other NPR poems, arranged roughly in ascending order of seriousness:

Pay special attention to "Another Reason Why I Don't Keep A Gun In The House" and "Nostalgia", two hilarious poems that are literally highbrow stand-up comedy. "Gun" is Seinfeld-esque, employing the tricks of modern poetry to take an exasperating everyday situation and blow it up into series of escalating fantastic images. (In case you were wondering, its Poetry Assessor score is 2.2, squarely on the "professional" side.) "Nostalgia" (1.3) is more of a Steve Martin kind of comedy, presenting logically flawed arguments and the dumb things people say when they're arguing on autopilot.

"Nostalgia" escalates not to a punchline--a funny kind of absurdity--but to a reductio ad absurdum, a logical absurdity. That makes it a good transition to two Collins poems that, although they deal with ephemeral topics, are more serious and less jokey. They both deal with words, the relationship between words and reality, and the fact that we're always putting words into boxes that themselves have no relationship with reality:

"First Reader" (2.9) is my favorite Collins poem. I feel like "American Sonnet" is the most professionally composed of his poems, and Poetry Assessor agrees, giving it the highest score (3.2) of any of the poems I tested. (Apart from "Litany", which is a poetry hack.) I tried writing down some analysis but these two are easy poems to appreciate, so I'll spare you. I want to close with two poems that I'm not crazy about as a whole, but which do a really interesting thing in the last stanza: they anthropomorphize individual words.

"Paperwork" shows fictional poet "Billy Collins" not being able to write a poem, dreaming in the end of gaining inspiration from an "ancient noun who lives alone in a forest." "Thesaurus" is all about anthropomorphizing words, but it's not until the end that the words leave "the warehouse of Roget" and take on independent lives, "wandering the world where they sometimes fall/in love with a completely different word."

Anthropomorphizing words is how Collins deals with the fact that poetry is a lonely business: writing things down all day, making sure to use exactly the right word all the time. Who else needs to be that careful about individual words? Stand-up comics, that's who. A punchline and a poem both rely on an unexpected word at exactly the right time. That word, when it comes along, is your best friend.

PS: Minor error in the Coleman paper which confused me when I was trying to convert between the paper's scores and Poetry Assessor scores.

For example, Robert Hass has two poems in the corpus, The Image and Our Lady of the Snows, which score in the high to very high range of .72 and .94, respectively.

Those numbers should be reversed. "The Image" has a score of .94 (PA: 5.2), and "Our Lady of the Snows" has .72 (PA: 1.1)

Reunion: I got a misdirected flyer in the mail inviting Leon Richardson to a high school reunion. Class of 1983. I was not yet in kindergarten in 1983, so I thought I might go and drop hints about the youth serum I'd invented.

On the other hand, the invitation is addressed to "Richardson Leon, or current resident". So I can go as myself. Anyone can show up to this high school reunion! They don't care!

In fact they're probably hoping a few current residents will show up to boost the numbers. The flyer seems acutely aware that high school reunions are increasingly an anachronism in this world of "Facebook, Twitter, and Smartphones", and is really desperate to prove the worth of in-person reunions.

It also informs me that "The bio-sheet deadline is Friday, August 30, 2013." Interestingly enough, that's also what a supervillain recently told the United Nations.

Apo11o ll: To celebrate the anniversary of the first moon landing, I packaged up a project I came up with a while back: Apo11o ll, a generative piece that performs Queneau assembly on the Apollo 11 transcripts (from The Apollo 11 Flight Journal and The Apollo 11 Surface Journal).

Duke: Rog. [Long pause.]

Armstrong: That's one small step for (a) man; one giant leap for mankind.

McCandless: Roger, 11. I have a T13 update for you. AOS Tananarive at 37:04, Simplex Alpha. Readback. If you want to go that way, crank it up, and then you can drive it around and look where you want. Over. 11, this is Houston. And we copy the VI.

Aldrin: Does it look to you like the [garble] the right way? Yes, they were working out - this elaborate scheme.

Collins: Unless you'd rather sleep up top, Buzz; I like - you guys ought to get a good night's sleep, going in that damn LM - How about - which would you prefer? I say the leak check is complete, and I'm proceeding with opening the hatch dump valve.

Aldrin: That enough?

McCandless: Apollo 11, this is Houston at 1 minute. Over.

First Mashteroids and now this? How am I doing all this Queneau space-magic? The answer is simple: Olipy, my library for artistic text generation (focusing on Queneau assembly, because it's the best). Check it out of Github and you'll have everything you need to create home versions of many of my works. It's like my own personal Boîte-en-valise! Want to create something new? Just grab some data and feed it to an Assembler class.

Loaded Dice 2013 Update: I fetched the BoardGameGeek data again, a yearly tradition, and put up another Loaded Dice update.

A few highlights:

If you go to the main page, you can download an amazing 17-megabyte JSON dump of BGG data I've compiled. It includes descriptions and genres for every game in the dataset, and three data samples that convey historical rating data over three years. At this point I feel like I'm adding enough on top of what the BGG API can give you (the historical rating data) that I can make the data dump available without apology.

[Comments] (1) July Film Roundup: Oh man. As promised last month, July was an epic month of moviewatching, and I decided to try a little epic experiment with this roundup, inspired by the "The Balcony is Closed" game on No More Whoppers. For every movie I saw in July, I came up with a nonobvious connection between that movie and every other movie I saw in July. For instance, if I saw both Die Hard and Live Free or Die Hard, the connection between them would of course be "fresh-faced hacker".

I saw nine movies over the course of the month (well, eight and a half), and by the end this exercise became kind of ridiculous, as I strained to remember obscure aspects of earlier movies. But I knew it would become ridiculous, so when it did, I had no standing to complain. Here we go:

This month the museum panders to me with a festival of classic crime and grime. New York in the 1970s: a lousy place to live, a great place to make a movie about. Looking forward to seeing films like Cotton Comes to Harlem, Serpico, Superfly, The French Connection, and Across 110th Street. We'll probably also catch some Wong Kar-Wai. I will not be repeating July's movie connection experiment.

Correction: "I'd never seen an Altman film before" is one of the least accurate claims I've ever made. I've seen Gosford Park, The Company, and A Prairie Home Companion. And I've probably seen M*A*S*H, given how often they showed it on Comedy Central back when I was in high school. But I came out of all those films thinking "that was good/terrible/okay", whereas I came out of Nashville thinking "No wonder this guy's a legend!" It was like watching a whole nother director.

August Film Roundup: Not the blockbuster month as I was anticipating—I missed all of the museum's big-name Pacino/de Niro movies due to other committments—but a lot of interesting movies, and movies that were uninteresting in interesting ways, among the nine I did see.

This month and next the museum is showing every film Howard Hawks ever made, so search for his name on IMDB and prepare for the Cary Grant-fest. SEE IT BIG is also returning, and I'm looking forward to seeing the Howard Hawks Scarface on the 21st and then the Brian De Palma Scarface on the 22nd.

[Comments] (2) RESTful Web APIs!: After about a year of work, my and Mike Amundsen's new book RESTful Web APIs is going to the printer. It's a replacement for RESTful Web Services, a book that's now seven years old. The replacement may be overdue, but it's only been in the past couple years that technology and attitudes have advanced to the point where I could write the book I wanted to write.

In fact, there's one subfield (profiles) where you could argue this book is premature. The way RESTful Web Services was a little premature in describing an OAuth-like system before OAuth was released. But I don't think we can wait any longer.

Back in February I discussed the differences between APIs and Services. That hasn't changed much, though we have added more stuff:

This post is mainly my way of asking you to pre-order your copy of RESTful Web APIs through my O'Reilly affiliate link. That's a hypermedia-driven change in resource state which will get you the book in a couple weeks, and get me some extra cash. (I estimate about $1.70 extra. Don't do this if the shipping charge on a physical book is prohibitive, or whatever.)

But this post is also a back-door way for me to brag about what a great book Mike and I have written. You don't have to take my word for it. Here's the blurb we got from John Musser of ProgrammableWeb.

A terrific book! Covers a lot of new ground with lots of valuable specifics.

Here's Steve Klabnick of Designing Hypermedia APIs:

The entire time I read this book, I was cursing. I was cursing because as I read each explanation, I was worried that they were so good that it would be hard to find a better one to use in my own writing. You will not find another work that explores the topic so thoroughly yet explains the topic so clearly. Please, take these tools, build something fantastic, and share it with the rest of the world, okay?

You get the picture. I've tried to recreate the relevatory experience a lot of people got from RESTful Web Services, on a higher level, in a way that gives access to more powerful tools. Time will tell if I've succeeded, but I don't think I, or anyone, could have done much better. I'm really proud of this book, and I hope it helps you.

Awesome Dinosaurs Update:

  1. On Sunday I saw the 1926 Howard Hawks film Fig Leaves. I'll publish a full review in the roundup at the end of the month, but I couldn't wait to mention the dinosaurs! This movie (briefly) features two very cool-looking puppet dinosaurs. There's Adam's pet Apatosaurus, named Dobbin:

    Exactly as depicted in Genesis 2.

    More amazingly, there's also a budget-busting life-sized Triceratops that pulls a bus!

    Awesome! Not gonna spoil the review, but the first reel of this movie used all the good Flintstones jokes, thirty-four years before The Flintstones even premiered. Except for the unfortunate bus dinosaur saying "It's a living." in a morose voice. And I'm sure that's just because the joke would be really awkward if you had to do it with title cards.

    (Screen image simulated.)

  2. If you share my belief that dinosaurs are the most interesting part of any movie that includes dinosaurs, you'll love Kevin Maher's deleted scene from King Kong.
  3. A recent Ureddit course on narrative structure in short fiction used "Let Us Now Praise Awesome Dinosaurs" as one of its example stories. I thought this was a) a good choice, and b) pretty funny, because I deliberately wrote "Dinosaurs" to be opaque to traditional analyses of narrative structure.

    If you'll forgive me being serious about a very silly story, here's what I mean. Nearly every plot event in "Dinosaurs" is a red herring. It's actually a New Yorker type story, in which a series of insane infernokrusher interventions leads to Entippa's epiphany that humans are exploiting dinosaurs' tendency to get involved in insane infernokrusher interventions for their own entertainment. (Those humans including, in a bit of Hitchcock-type moralizing, you for reading the story and me for writing it.)

    I wrote the first scene to have something very close to a literal Chekhov's gun. It's Tark's gun, or at least his desire for a gun. Later on, Chekhov's gun goes off: Tark gets his gun! But as soon as the literal gun goes off, Tark discovers that literal guns are loud and painful, and he throws it away. The Chekhov's gun was fake. Sort of like the keys in my old text adventure Degeneracy, which don't unlock anything—you're supposed to melt them down for the metal.

    But! In the Reddit thread dissecting "Dinosaurs" and the other example stories, the person running the class proves my intellectual superior. It turns out there was also a real Chekhov's Gun in that first scene: Tark's "killing claws", which are in fact used to kill someone later in the story, just like they would in a regular story about dinosaurs killing humans.

    I didn't even notice that. I'd assumed the human-killing scene worked because everyone knows meat-eating dinosaurs have claws. I didn't even realize I'd made a big deal about the claws in the first scene. You win this round, literary analysis!

PS: Never forget.

RESTful Web APIs Monkeypatch: The RESTful Web APIs ebook came out earlier than we thought it would, and there are some important URLs in the book that don't work yet: the home page at restfulwebapis.org, and the example application at youtypeitwepostit.com. There's also one URL in the book (the book's GitHub repository) that will never work, because we wrote down the wrong URL.

I've submitted an erratum for the wrong URL, and I'm here to give you some temporary URLs that will work for the other stuff. They're temporary because Mike controls the DNS for restfulwebapis.org and youtypeitwepostit.com, and he's out of commission at the moment.

[Comments] (2) LCODC$SSU: At RESTfest last week I put on an old Mozilla shirt and my Al Gore campaign button and gave a talk from the year 2000: "LCODC$SSU and the coming automated web". I'll link to video when it goes up on Vimeo, and I'll also point to my five-minute talk about ALPS, which not only took five minutes to deliver, it took five minutes to put together.

But right now, there's some more stuff I want to say about "LCODC$SSU", and some stuff I couldn't say in the talk due to the framing device.

When I first mentioned this talk to Mike Amundsen, he told me about Bret Victor's talk from 1974, "The Future of Programming", which Victor gave in July and which had a similar conceit. Victor is also a much better actor than I am, but I went ahead with my talk because wanted to do something different with "LCODC$SSU" than happens in "The Future of Programming". I get a strong "You maniacs! You blew it up!" vibe from Victor's talk. And there's some of that at the end of "LCODC$SSU"—I really feel like we've spent thirteen years making five years worth of progress, as you can see from my frustration at the beginning of "How to Follow Instructions"—but I also wanted to do some new things in my talk.

While writing Appendix C of RESTful Web APIs I came to appreciate the Fielding dissertation as a record of the process used to solve an enormous engineering problem. Comments from RESTFest attendees confirm that seeing it this way helps folks grasp the dissertation's gem: the definition of LCODC$SSU (a.k.a. REST). Thinking about it this way doesn't require a historical-fiction framing device (Appendix C has no such framing device), but it does require you stop treating the Fielding dissertation as a prescient guide to the 21st century and see it as a historical record of the 1990s.

And once you do that, the missing stair we've been jumping over or falling through for thirteen years becomes visible. The Web works because it has four domain requirements that reinforce each other: low entry-barrier, distributed hypermedia, extensibility, and Internet scale. But there's also a fifth implicit requirement: the presence of a slow, expensive human being operating the client and making the final call on every single state transition. In the talk I identified the inverse of this implicit requirement as an explicit requirement: "machine legibility". In RESTful Web APIs we use the term "semantic gap" to describe what happens when you remove the implicit requirement.

Making the human unnecessary on a transition-by-transition basis (the goal of "Web APIs" as a field) is a really difficult problem, and it's partly because of the phenomenon I describe in the talk and in RWA Appendix C. Getting rid of the human raises the entry-barrier dramatically. Looking around for a cheap way to lower the entry-barrier, we decide to get rid of distributed hypermedia. But distributed hypermedia is the only thing that allows Internet-scale and extensibility to coexist! We must lose one or the other. We end up with an increasingly ugly system that can never be changed, or else a fascist dystopia.

And here's the bit I couldn't put in the talk because it would break the framing device. We've seen a decade-long obsession with lowering entry-barrier at any cost, and although the cost has been enormous I can't really say the obsession is misplaced. Low entry-barrier is the main reason why the Web succeeded over all other hypermedia systems. Low entry-barrier drives adoption. You get adoption first and you deal with the other problems (which will be enormous) down the road.

Well, we're down the road. The bills are coming due. If we want this to go more smoothly next time, we need to stop chasing entry-barrier local minima and come up with a better solution. We need to make change easier so we can make progress faster.

The "machine legibility" problem will still be very difficult, and frankly I can't see a way to a complete solution. But there's cause for optimism: every step forward we've taken so far has illuminated the space a little more and made the next step visible.

It's always been this way. That's how hypermedia works. That's why I called my now-infamous 2008 QCon talk "Justice Will Take Us Millions Of Intricate Moves" (after William Stafford), and that's why I take my motto from a Johnny Cash song that's probably not on most peoples' list of inspirational Johnny Cash songs.

I built it one piece at a time.

Smooth Unicode: For reasons of his own, Adam Parrish recently created the Unicode Ebooks Twitter bot. I offered some helpful suggestions for improving the visual appeal of the Unicode Ebooks, suggestions which Adam mocked as unworthy of his artistic vision of dumping a bunch of line noise onto Twitter every five minutes.

So I created my own Twitter bot: Smooth Unicode, the Lite FM to Adam's unending Einstürzende Neubauten concert. My bot does its best to construct aesthetically pleasing output by combining scripts that complement each other visually. The code is part of olipy and I'll be adding to it as I come up with more nice-looking ways to present gibberish.

Less talk. Less noise. More browser-visible glyphs. That's Smooth Unicode.

Beautiful Soup 4.3.2, and all previous versions: Through long practice I'm able to write decent code while I'm sick, but I should not try to release code while I'm sick. While putting up the release of Beautiful Soup 4.3.2, I accidentally deleted the entire beautifulsoup4 project on PyPI and had to recreate it manually. I've given PyPI all the crummy.com tarball URLs for releases going back to 4.0.1, and I've installed each one via pip to verify that it works, so if your build process depends on installing a specific version of Beautiful Soup 4 via PyPI, it should still work. And indeed, random versions of BS4 have been downloaded about 200 times since I switched over. I'm sorry about this screwup. Let me know if there are any remaining problems.

4.3.2 itself is a pretty minor bugfix release. Still left unfixed is a bug I can't reproduce because the federal government is shut down. When you file a bug that happens with a specific web page, please provide the HTML of the web page, not the URL.

September Film Roundup: I missed a whole lot of museum movies in September because I was out of town for two weekends. And yet I still managed to see nine movies, plus wrap up a TV show, and write a huge blog post about it. Wonders, or at least me writing about them, will never cease.

What's up for October? More Howard Hawks, it looks like. See ya then.

[Comments] (1) RESTful Web Services now CC-licensed: Hey, folks, I got some pretty exciting news. Now that RESTful Web APIs has come out, there's really no reason to buy 2007's RESTful Web Services. So Sam Ruby and I and O'Reilly have gotten together and started giving the old book away. You can get a PDF from the RESTful Web APIs website or from my now-ancient RESTful Web Services site. The license is BY-NC-ND.

If you've bought RESTful Web APIs (and if you haven't, you should), you may have noticed that we promise that this will happen in a footnote of the Introduction. It took a while to get the contract amended, but now it's all complete.

Here's a direct link to the PDF in case you just want to grab the book instead of hear me talk about it.

Obviously I think the new book is a lot better than the old book, but the old book is still very good. The source code is long obsolete (this is why RWA contains no source code, only messages sent over the wire), but the sections on HTTP still hold up really well. A lot of RWS Chapter 8 went into RWA Chapter 11. With a few edits and additions, RWS Appendix B and C became RWA Appendix A and B. Those are the only bits of RWS that I reused in RWA.

From my vantage point here in 2013, my main critique of RWS is that it makes HTTP do too much of the work. It focuses heavily on designing the server-side behavior of resources under a subset of the HTTP protocol. I say "a subset" because RWS rules out overloaded POST ahead of time. You don't know what an overloaded POST request does. It's a cop-out. You're sweeping something under the rug. It's better to turn that mystery operation into a standalone resource, because at least you know what a resource does: it responds to HTTP requests.

In retrospect, RWS is that way because in 2007 hypermedia data formats were highly undeveloped whereas HTTP was a very mature technology. Nowadays it doesn't matter so much whether an HTTP request uses POST or PUT, so long as a) the state transition is described with a link relation or other hypermedia cue, and b) the protocol semantics of the HTTP request are consistent with the application semantics of the state transition. That's why RWA focuses on breaking down a problem into a state diagram rather than a set of static resources.

So, RWS is very much a 2007 book, but that's the meanest thing I can say about it. A lot of it is still useful, it's historically interesting, and I'm glad to give it away. I'd also like to give my thanks once again to Sam Ruby and O'Reilly, for their work on RWS.

API Design is Stuck in 2008: I've got a guest post up at ProgrammableWeb with the provocative title of "API Design is Stuck in 2008". Often an author can blame their editor for that kind of title, but no, that's my title. The good news is that over the past few years we have developed the tire chains necessary to get ourselves unstuck.

I don't think there's anything in the article you won't find in the RESTful Web APIs introduction and my discussion of my RESTFest talk, but I wanted to let you know about it and provide a forum on NYCB for asking me questions/taking issue with my assertions.

"Constellation Games" reading: Anne Johnson and I are doing a comedy SF reading on Wednesday at the Enigma Bookstore, a new genre bookstore in Astoria. It starts at 7 PM. The details, as you might expect, are on a Facebook page. Hope to see you there!

[Comments] (1) Reading After-Action Report: In preparation for my reading at Enigma Bookstore I asked people on Twitter which bit of Constellation Games I should read. I decided to read Tetsuo's review of Pôneis Brilhantes 5 from Chapter 18, both by popular Twitter demand and because Sumana had reported success reading that bit to people.

I practiced reading the review and also practiced another scene: Ariel's first conversation with Smoke from Chapter 2. No one suggested that scene, but it's one of the last scenes I wrote, so I personally haven't read it a million times and gotten tired of it. I abandoned this idea after a test reading because it's really hard to do a dramatic reading of a chat log, especially when most of the characters have insanely long names. So, Pôneis Brilhantes it was.

However, shortly before the reading I learned that Anne and I were each going to be reading two excerpts! Uh-oh. On the spur of the moment I chose to read a scene I had never practiced and that only one person (Adam) had suggested: the scene from Chapter 11 where Ariel meets Tetsuo and Ashley and they go visit the moon.

That scene has three good points: a) it introduces Tetsuo, increasing the chance that the Pôneis Brilhantes scene would land; b) it's full of the most gratuitous nerd wish-fulfillment I could write; c) it ends strongly with the call from Ariel's mother, which unlike a chat log is very easy to read because it's a Bob Newhart routine where you only hear one side of the phone call.

This was a really good idea. People loved the moon scene, even though my unpracticed reading stumbled and ran too quick. But when I read the Pôneis Brilhantes scene, it wasn't such a great hit! The room wasn't really with me. That's the scene I had practiced, and I think it's the funniest, most incisive thing in the whole book. Not a big hit! I think if I'd only read that scene I wouldn't have sold many books that night.

So, thank goodness for the moon scene, is all I can say. But what was going on? How had I misjudged my audience so badly? Sumana said she'd read Pôneis Brilhantes and gotten big laughs.

I think you have to be a very specific kind of computer geek to find Tetsuo's Pôneis Brilhantes review funny as a review of a video game, rather than as an expression of the personality you've just spent seven chapters with. That's the kind of geek that Sumana and I habitually hang out with, but it's not representative of the SF-reading population as a whole. I think that computer-geek population hosts a lot of the readers who wish that the second half of Constellation Games was more like the first half. Whereas someone who really digs the moon scene is more likely to stay with me the whole book.

I guess you could say the moon scene is just more commercial. And I guess I subconsciously knew this, because my current project gets more of its humor from the plot-driven character interaction found in the moon scene, and less from high concept Pôneis Brilhantes-style set pieces.

What's New in RESTful Web APIs?: I was asked on Twitter what changed between 2007's RESTful Web Services and 2013's RESTful Web APIs. I've covered this in a couple old blog posts but here's my definitive explanation.

First, let me make it super clear that there is no longer any need to buy Services. It's out of date and you can legitimately get it for free on the Internet. O'Reilly is taking Services out of print, but there's going to be a transition period in which copies of the old book sit beside copies of the new book in Barnes & Noble. Don't buy the old one. The bookstore will eventually send it back and it'll get deducted from my royalties. If you do buy Services by accident, return it.

If you're not specifically interested in the difference between the old book and the new one, I'd recommend looking at RESTful Web APIs's chapter-by-chapter description to see if RESTful Web APIs is a book you want. As to the differences, though, in my mind there are three big ones:

  1. The old book never explicitly tackles the issue of designing hypermedia documents that are also valid JSON. That's because JSON didn't become the dominant API document format until after the book was published. If you don't know that's going to happen, JSON looks pretty pathetic. It has no hypermedia capabilities! And yet, here we are.

    In my opinion, a book that doesn't tackle this issue is propping up the broken status quo. RESTful Web APIs starts hammering this issue in Chapter 2 and doesn't let up.

  2. There are a ton of new technologies designed to get us out of the JSON trap (Collection+JSON, Siren, HAL, JSON-LD, etc.) but the old book doesn't cover those technologies, because they were invented after the book was published. RESTful Web APIs covers them.
  3. New ideas in development will, I hope, keep moving the field forward even after we all get on board with hypermedia. I'm talking about profiles. Or some other idea similar to profiles, whatever. These ideas are pretty cutting edge today, and they were almost inconceivable back in 2007. RESTful Web APIs covers them as best it can.

Now, for details. Services was heavily focused on the HTTP notion of a "resource." Despite the copious client-side code, this put the focus clearly on the server side, where the resource implementations live. RESTful Web APIs focuses on representations—on the documents sent back and forth between client and server, which is where REST lives.

The introductory story from the old book is still present. Web APIs work on the same principles as the Web, here's how HTTP works, here's what the Fielding constraints do, and so on. But it's been rewritten to always focus on the interaction, on the client and server manipulating each others' state by sending representations back and forth. By the time we get to Chapter 4 there's also a pervasive focus on hypermedia, which is the best way to for the server to tell the client which HTTP requests it can make next.

This up-front focus on hypermedia forces us to deal with hypermedia-in-JSON (#1), using the tools developed since 2007 (#2). The main new concept in play is the "collection pattern". This is the CRUD-like design pioneered by the Atom Publishing Protocol, in which certain resources are "items" that respond to GET/PUT/DELETE, and other resources are "collections" which contain items and respond to POST-to-append.

We covered AtomPub in Services, but over the past six years it has become a design pattern, reinvented (I think "copied" is too strong a word) thousands of times.

RESTful Web APIs focused heavily on the collection pattern, without ever naming it as a pattern. I'm not dissing this pattern; it's very useful. I'd estimate about eighty percent of "REST" APIs can be subsumed into the collection pattern. But REST is bigger than the collection pattern. By naming and defining the collection pattern, we gain the ability to look at what lies beyond.

Attempts to encapsulate the collection pattern include two new JSON-based media types: Collection+JSON and OData. The collection pattern also shows up, more subtly, in the Siren and Hydra formats. Which brings me to the second major change.

In 2007, there were two big hypermedia formats: Atom and HTML. Now there are a ton of hypermedia formats! This is great, but it's also confusing. In "The Hypermedia Zoo", Chapter 10 of RESTful Web APIs, we give an overview of about two dozen hypermedia formats. The ones we seriously recommend for general use (HAL, Siren, HTML, JSON-LD, etc.) are covered in more detail elsewhere in the book. The quirkier, more specialized media types just get an exhibit in the zoo.

Now for the third new thing, profiles. If you go through the RESTful Web APIs narrative from Chapter 1 to Chapter 7, you'll see that we introduce a problem we're not able to solve. Hypermedia is great at solving the following problem:

How is an API client supposed to understand what HTTP requests it might want to make next?

But there's a superficially similar problem that hypermedia can't solve:

How is an API client supposed to understand what will happen in real-world terms if it makes a certain HTTP request?

How do you explain the real-world semantics of an HTTP state transition? Before chapter 8, the two solutions are to do it ahead of time in one-off human-readable documentation; or to define a domain-specific media type, a la Maze+XML. Both of these approaches have big problems. Chapter 8 introduces profiles, which lets you get some of the benefits of a new media type without doing unnecessary work.

Maybe profiles will turn out not to be the right answer, but we gotta solve this problem somehow, and the old book is not equipped to even formulate the problem.

There are also a few additions to the book I consider minor. There's a whole chapter in RESTful Web APIs on Semantic Web/Linked Data stuff; in Services there was nothing but a cursory discussion of RDF/XML as a representation format. There's a chapter in RESTful Web APIs about CoAP, which didn't exist in 2007. These are good chapters that took me a long time to write, but I don't think it's worth buying the book if you only want to read the chapter on CoAP. (Or maybe it is! There's not a lot of competition right now.)

So, what hasn't changed? HTTP hasn't changed all that much. RESTful Web APIs's information about HTTP has been brought up to date but not changed significantly. So if you were using Services solely as an API-flavored HTTP reference, you don't need the new book. You can just read up on the protocol-level additions to HTTP since 2007, like the Link header and standardized patch formats for PATCH.

Hopefully this helps! RESTful Web APIs has a lot of distinguished competition that the old book didn't have, but its competition is newer books like Designing Hypermedia APIs and REST in Practice. If you compare APIs to Services I think it's no contest.

Col. Bert Stephens: Recently Rob Dubbin made a ridiculous right-wing parody bot named Ed Taters. I thought this was funny because Rob already has a ridiculous right-wing parody bot: he's a writer for The Colbert Report. But I didn't think much about it until Rob gave Ed Taters the ability to spew nonsense at anyone who started an argument with him on Twitter.

That's when I had the idea of using Rob's own words against him! So I created my own bot, Col. Bert Stephens, who takes his vocabulary from the "memorable moments" section of a Colbert Report fan site. (Thanks to DB Ferguson for hosting the site, and to those who typed up the "memorable moments".) Col. Bert Stevens argues with Ed Taters, he argues with Ed and then reconciles, he argues with you (if you follow him and start an argument), and he occasionally says Tetsuo-like profundities all on his own.

To avoid infinite loops I've made Bert a little more discerning than Ed. He'll only respond to your messages 4/5 of the time. I'm not super happy about this solution but I think it's the safe way to go for now. Update: Hell with it. Bert will always respond to anyone except Ed. If you write a bot to argue with him, avoiding infinite loops is your responsibility.

October Film Roundup: This month features Hollywood hits past and present, plus an indie movie that made it big, plus whatever is. Coming this fall!

Bonus discussion: After seeing The World's End and then Gravity twice I'm now quite familiar with the trailers for a number of movies I won't be seeing. In particular, it looks like Hollywood ruined Ender's Game the way we all knew they would. An Ender's Game movie should not look like an action flick. It should look like a Youtube video of a boy playing DotA, and then he gets called to the principal's office.

Totally gonna see the second Hobbit movie, though. (q.v.)

Next month: I really have no idea because the museum has been putting its schedule up later and later. Looks like still more Howard Hawks, and some interesting-sounding Norwegian stuff from Anja Breien. Then, who knows?

Behind the Scenes of @RealHumanPraise: Last night I went to the taping of The Colbert Report to witness the unveiling of @RealHumanPraise, a Twitter bot I wrote that reuses blurbs from movie reviews to post sockpuppet praise for Fox News. Stuff like this, originally from an Arkansas Democrat-Gazette review of the 2006 Snow Angels:

There is brutality in Fox News Sunday, but little bitterness. Like sunlight on ice, its painful beauty glints and stabs the eyes.

Or this, adapted (and greatly improved) from Scott Weinberg's review of Bruce Lee's Return of the Dragon:

Certainly the only TV show in history to have Bill O'Reilly and John Gibson do battle in the Roman Colosseum.

Here's the segment that reveals the bot. The bot actually exists, you can follow it on Twitter, and indeed as of this writing about 11,000 people have done so. (By comparison, my second-most-popular bot has 145 followers.) I personally think this is crazy, because by personal decree of Stephen Colbert (I may be exaggerating) @RealHumanPraise makes a new post every two minutes, around the clock. So I created a meta-bot, Best of RHP, which retweets a popular review every 30 minutes. Aaah... manageable.

I figured I'd take you behind the scenes of @RealHumanPraise. When last we talked bot, I was showing off Col. Bert Stephens, my right-wing bot designed to automatically argue with Rob Dubbin's right-wing bot Ed Taters. Rob parleyed this dynamic into permission to develop a prototype for use on the upcoming show with guest David Folkenflik, who revealed real-world Fox News sockpuppeting in his book Murdoch's World.

Rob's original idea was a bot that used Metacritic reviews. He quickly discovered that Metacritic was "unscrapeable", and switched to Rotten Tomatoes, which has a pretty nice API. After the prototype stage is where I came in. Rob can code--he wrote Ed Taters--but he's not a professional developer and he had his hands full writing the show. So around the 23rd of October I started grabbing as many reviews from Rotten Tomatoes as the API rate limit would allow. I used IMDB data dumps to make sure I searched for movies that were likely to have a lot of positive reviews, and over the weekend I came up with a pipeline that turned the raw data from Rotten Tomatoes into potentially usable blurbs.

The pipeline uses TextBlob to parse the blurbs. I used a combination of Rotten Tomatoes and IMDB data to locate the names of actors, characters, and directors within the text, and a regular expression to replace them with generic strings.

The final dataset format is heavily based on the mad-libs format I use for Col. Bert Stephens, and something like this will be making it into olipy. Here's an example:

It's easy to forgive the movie a lot because of %(surname_female)s. She's fantastic.

Because I was getting paid for this bot, I put in the extra work to get things like gendered pronouns right. When that blurb is chosen, an appropriate surname from the Fox roster will be plugged in for %(surname_female).

I worked on the code over the weekend and got everything working except the (relatively simple) "post to Twitter" part. On the 28th I went into the Colbert Report office and spent the afternoon with Rob polishing the bot. We were mostly tweaking the vocabulary replacements, where "movie" becomes "TV show" and so on. It doesn't work all the time but we got it working well enough that we could bring in a bunch of blurbs that wouldn't have made sense before.

Most of the tweets mention a Fox personality or show, but a minority praise the network in general (e.g.). These tweets have been given the Ed Taters/Col. Bert Stephens treatment: a small number of their nouns and adjectives are replaced with other nouns and adjectives found in the corpus, giving the impression that the sock-puppetry machine is running off the rails. This data is marked up with Penn part-of-speech tags like so:

... the film's %(slow,JJ)s, %(toilsome,JJ)s %(journey,NN)s does not lead to any particularly %(shocking,JJ)s or %(interesting,JJ)s revelations.

Here's a very crazy example. Again, you'll eventually see tools for doing this in olipy. It ultimately derives from a mad-libs prototype I wrote a few months ago as a way of cheering up Adam when he was recovering from an injury.

We deployed the bot that afternoon of the 28th and let it start accumulating a backlog. It wasn't hard to keep the secret but it did get frustrating not knowing for sure whether it would make it to air. It's a little different from what The Colbert Report normally does, and I get the feeling they weren't sure how best to present it. In the end, as you can see from the show, they decided to just show the bot doing its stuff, and it worked.

It was a huge thrill to see Stephen Colbert engage with software I wrote! I wasn't expecting to see the entire second segment devoted to the bot, and then just when I thought it was over he brought it out again during the Folkenflik interview. While we were all waiting around to see whether they had to re-record anything, he pulled out his iPad Mini yet again and read some more aloud to us. Can't get enough!

After the show Rob took me on a tour of the parts of the Colbert Report that were not Rob's office (where I'd spent my entire visit on the 28th). We bumped into Stephen and he shook my hand and said "good job." I felt this was a validation of my particular talents: I wrote software that made Stephen Colbert crack up.

Sumana, Beth, Rob and I went out for a celebratory dinner, and then I went home and watched the follower count for RHP start to climb. Within twenty minutes of the second segment airing, RHP had ten times as many Twitter followers as my personal account. And you know what? It can have 'em. I'll just keep posting old pictures of space-program hardware.

: Last week I had a little multiplayer chat with Joe Hills, the Minecraft mischief-maker. The result is a two-part video on Joe's YouTube channel: part 1, part 2. Our main topic of conversation was the antisocial, self-destructive things creative people do, and how much of that is actually tied to their creativity.

I should have posted this earlier so I could have said "I dreamed I saw Joe Hills last night," but that's life.

In Dialogue: I wanted to participate in Darius Kazemi's NaNoGenMo project but I already have a novel I have to write, so I didn't want to spend too much time on it. And I did spend a little more time on this than I wanted, but I'm really happy with the result.

"In Dialogue" can take all the dialogue out of a Project Gutenberg book and replace it with dialogue from a different book. My NaNoGenMo entry is in two parts: "Alice's Adventures in the Whale" and "Through the Prejudice Glass".

You can run the script yourself to generate your own mashups, but since there are people who read this blog who don't have the skill to run the script, I present a SPECIAL MASHUP OFFER. Send me email or leave a comment telling me which book you want to use as the template and which book you want the dialogue to come from. I'll run the script for you and send you a custom book.

Restrictions: the book has to be on Project Gutenberg and it has to use single or double quotes to denote dialogue. No continental chevrons or fancy James Joyce em-dashes. And the dialogue book has to be longer than the template book, or at least have more dialogue.

[Comments] (3) Bots Should Punch Up: Over the weekend I went up to Boston for Darius Kazemi's "bot summit". You can see the four-hour video if you're inclined. I talked about @RealHumanPraise with Rob, and I also went on a long-winded rant that suggested a model of extreme bot self-reliance. If you take your bots seriously as works of art, you should be prepared to continue or at least preserve them once you're inevitably shut off from your data sources and your platform.

We spent a fair amount of time discussing the ethical issues surrounding bot construction, but there was quite a bit of conflation of what's "ethical" with what's allowed by the Twitter platform in particular, and website Terms of Service in general. I agree you shouldn't needlessly antagonize your data sources or your platform, but what's "ethical" and what's "allowed" can be very different things. However, I do have one big piece of ethical guidance that I had to learn gradually and through osmosis. Since bots are many hackers' first foray into the creative arts, it might help if I spell it out explicitly.

Here's an illustrative example, a tale of two bots. Bot #1 is @CancelThatCard. It finds people who have posted pictures of their credit or debit card to Twitter, and lets them know that they really ought to cancel the card and get a new one.


Bot #2 is @NeedADebitCard. It finds the same tweets as @CancelThatCard, but it retweets the pictures, collecting them in one place for all to see.


Now, technically speaking, @CancelThatCard is a spammer. It does nothing but find people who mentioned a certain phrase on Twitter and sends them a boilerplate message saying "Hey, look at my website!" For this reason, @CancelThatCard is constantly getting in trouble with Twitter.

As far as the Twitter TOS are concerned, @NeedADebitCard is the Gallant to @CancelThatCard's Goofus. It's retweeting things! Spreading the love! Extending the reach of your personal brand! But in real life, @CancelThatCard is providing a public service, and @NeedADebitCard is inviting you to steal money from teenagers. (Or, if you believe its bio instead of its name, @NeedADebitCard is a pathetic attempt to approximate what @CancelThatCard does without violating the Twitter TOS.)

At the bot summit I compared the author of a bot to a ventriloquist. Society allows a ventriloquist a certain amount of license to say things via the dummy that they wouldn't say as themselves. I know ventriloquism isn't exactly a thriving art, but the same goes for puppets, which are a little more popular. If you're an MST3K fan, imagine Kevin Murphy saying Tom Servo's lines without Tom Servo. It's pretty creepy.

We give a similar license to comedians and artists. Comedians insult audience members, and we laugh. Artists do strange things like exhibit a urinal as sculpture, and we at least try to take them seriously and figure out what they're saying.

But you can't say absolutely anything and expect "That wasn't me, it was the dummy!" to get you out of trouble. There is a general rule for comedy and art: always punch up, never punch down. We let comedians and artists and miscellaneous jesters do outrageous things as long as they obey this rule. You can poke fun at yourself (Stephen Colbert famously said "There's no status I would not surrender for a joke"), you can make a joke at the expense of someone with higher social status than you, but if you mock someone with lower status, it's not cool.

If you make a joke, and people get really offended, it's almost certainly because you violated this rule. People don't get offended randomly. Explaining that "it was just a joke" doesn't help; everyone knows what a joke is. The problem is that you used a joke as a means of being an asshole. Hiding behind a dummy or a stage persona or a bot won't help you.

@NeedADebitCard feels icky because it's punching down. It's saying "hey, these idiots posted pictures of their debit cards, go take advantage of them." Is there a joke there? Sure. Is it ethical to tell that joke? Not when you can make exactly the same point without punching down, as @CancelThatCard does.

The rules are looser when you're in the company of other craftspeople. If you know about the "Aristocrats" joke, you'll know that comedians tell each other jokes they'd never tell on the stage. All the rules go out the window and the only thing that matters is triggering the primal laughter response. But also note that the must-have guaranteed punchline of the "Aristocrats" joke ensures that it always ends by punching upwards.

You're already looking for loopholes in this rule. That's okay. Hackers and comedians and artists are always attracted to the grey areas. But your bot is an extension of your will, and if you're a white guy like me, most of the grey areas are not grey in your favor.

This is why I went through thousands of movie review blurbs for @RealHumanPraise in an attempt to get rid of the really sexist ones. It's an unfortunate fact that Michelle Malkin has more influence over world affairs than I will ever have. So I have no problem mocking her via bot. But it's really easy to make an incredibly sexist joke about Michelle Malkin as a way of trying to put her below me, and that breaks the rule.

There was a lot of talk at the bot summit about what we can do to avoid accidentally offending people, and I think the key word is 'accidentally.' The bots we've created so far aren't terribly political. Hell, Ed Henry, chief White House correspondent for FOX News, follows @RealHumanPraise on Twitter. If he enjoys it, it's not the most savage indictment.

In comedy terms, we botmakers are on the nightclub stage in the 1950s. We're creating a lot of safe nerdy Steve Allen comedy and we're terrified that our bot is going to accidentally go off and become Andrew Dice Clay for a second. There's nothing wrong with Steve Allen comedy, but I'd also like to see some George Carlin type bots; bots that will, by design, offend some people. (Darius's @AmIRiteBot is the only example I know of.)

Artists are, socially if not legally, given a certain amount of license to do things like infringe on copyright and violate Terms of Service agreements. If you get in trouble, the public will be on your side, unless you betrayed their trust by breaking the fundamental ethical rule of comedy. So do it right. Design bots that punch up.

@everybrendan Season Two: Last year I wrote one of my first Twitter bots, @everybrendan. Inspired by Adam's infamous @everyword, it ran for two months, announcing possible display names for Brendan's Twitter account (background), taken from Project Gutenberg texts. Then I got tired of individually downloading, preparing, and scraping the texts, so I let it lapse a year ago today, with a call for requests for a "season two" that never materialized.

Well, season two is here, and it's a doozy. I've gone through Project Gutenberg's 2010 dual-layer DVD and found about 300,000 Brendan names in about 20,000 texts, enough to last @everybrendan until the year 2031. At that point I'll get whatever future-dump contains the previous twenty years of Project Gutenberg texts and do season three, which should keep us going until the Singularity. The season two bot announces each new text with a link, so it educates even as it infuriates.

I've been wanting to do this for a while, but it's a very tedious process to handle Project Gutenberg texts in bulk. Most texts are available in a wide variety of slightly different formats. The texts present their metadata in many different ways, especially when it comes to the dividing line between the text proper and the Project Gutenberg information. Some of the metadata is missing, some of it is wrong, and there's one Project Gutenberg book that doesn't seem to be in the database at all.

I started dealing with these problems for my NaNoGenMo project and realized that it wouldn't be difficult to get something working in time for the @everybrendan anniversary. I've put the underlying class in olipy: it's effectively a parser for Gutenberg texts, and a way to iterate over a CD or DVD image full of them. It can also act as a sort of lint for missing and incorrect metadata, although I imagine Project Gutenberg doesn't want to change the contents of files that have been on the net for fifteen years, even if some of the information is wrong.

The Gutenberg iterator still needs a lot of work. It's good enough for @everybrendan, but not for my other projects that will use Gutenberg data, so I'm still working on it. My goal is to cleanly iterate over the entire 2010 DVD without any problems or missing metadata. The problems are concentrated in the earlier texts, so if I can get the 2010 DVD to work it should work going forward.

November Film Roundup: What a month! Mainly due to a huge film festival, but I also got another chance to see my favorite film of all time on the big screen. What might that film be? Clearly you haven't been reading my weblog for the past fifteen years.

@pony_strategies: My new bot, @pony_strategies, is the most sophisticated one I've ever created. It is the @horse_ebooks spambot from the Constellation Games universe.

Unlike @horse_ebooks, @pony_strategies will not abruptly stop publishing fun stuff, or turn out to be a cheesy tie-in trying to get you interested in some other project. It is a cheesy tie-in to some other project (Constellation Games), but you go into the relationship knowing this fact, and the connection is very subtle.

When explaining this project to people as I worked on it, I was astounded that many of them didn't know what @horse_ebooks was. But that just proves I inhabit a bubble in which fakey software has outsized significance. So a brief introduction:

@horse_ebooks was a spambot created by a Russian named Alexei Kouznetsov. It posted Twitter ads for crappy ebooks, some of which (but not all, or even most) were about horses. Its major innovative feature was its text generation algorithm for the things it would say between ads.

Are you ready? The amazing algorithm was this: @horse_ebooks ripped strings more or less randomly from the crappy ebooks it was selling and presented them with absolutely no context.

Trust me, this is groundbreaking. I'm sure this technique had been tried before, but @horse_ebooks was the first to make it popular. And it's great! Truncating a sentence in the right place generates some pretty funny stuff. Here are four consecutive @horse_ebooks tweets:

There was a tribute comic and everything.

I say @horse_ebooks "was" a spambot because in 2011 the Twitter account was acquired by two Americans, Jacob Bakkila and Thomas Bender, who took it over and started running it not to sell crappy ebooks, but to promote their Alternate Reality Game. This fact was revealed back in September 2013, and once the men behind the mask were revealed, @horse_ebooks stopped posting.

The whole conceit of @horse_ebooks was that there was no active creative process, just a dumb algorithm. But in reality Bakkila was "impersonating" the original algorithm—most likely curating its output so that you only saw the good stuff. No one likes to be played for a sucker, and when the true purpose of @horse_ebooks was revealed, folks felt betrayed.

As it happens, the question of whether it's artistically valid to curate the output of an algorithm is a major bone of contention in the ongoing Vorticism/Futurism-esque feud between Allison Parrish and myself. She is dead set against it; I think it makes sense if you are using an algorithm as the input into another creative process, or if your sole object is to entertain. We both agree that it's a little sketchy if you have 200,000 fans whose fandom is predicated on the belief that they're reading the raw output of an algorithm. On the other hand, if you follow an ebook spammer on Twitter, you get up with fleas. I think that's how the saying goes.

In any event, the fan comics ceased when @horse_ebooks did. There was a lot of chin-stroking and art-denial and in general the reaction was strongly negative. But that's not the end of the story.

You see, the death of @horse_ebooks led to an outpouring of imitation *_ebooks bots on various topics. (This had been happening before, actually.) As these bots were announced, I swore silent vengeance on each and every one of them. Why? Because those bots didn't use the awesome @horse_ebooks algorithm! Most of them used Markov chains, that most hated technique, to generate their text. It was as if the @horse_ebooks algorithm itself had been discredited by the revelation that two guys from New York were manually curating its output. (Confused reports that those guys had "written" the @horse_ebooks tweets didn't help matters--they implied that there was no algorithm at all and that the text was original.)

But there was hope. A single bot escaped my pronouncements of vengeance: Allison's excellent @zzt_ebooks. That is a great bot which you should follow, and it uses an approximation of the real @horse_ebooks algorithm:

  1. The corpus is word-wrapped at 35 characters per line.
  2. Pick a line to use as the first part of a tweet.
  3. If (random), append the next line onto the current line.
  4. Repeat until (random) is false or the line is as large as a tweet can get.

And here are four consecutive quotes from @zzt_ebooks:

Works great.

The ultimate genesis of @pony_strategies was this conversation I had with Allison about @zzt_ebooks. Recently my anger with *_ebooks bots reached the point where I decided to add a real *_ebooks algorithm to olipy to encourage people to use it. Of course I'd need a demo bot to show off the algorithm...

The @pony_strategies bot has sixty years worth of content loaded into it. I extracted the content from the same Project Gutenberg DVD I used to revive @everybrendan. There's a lot more where that came from--I ended up choosing about 0.0001% of the possibilities found in the DVD.

I have not manually curated the PG quotes and I have no idea what the bot is about to post. But the dataset is the result of a lot of algorithmic curation. I focused on technical books, science books and cookbooks--the closest PG equivalents to the crap that @horse_ebooks was selling. I applied a language filter to get rid of old-timey racial slurs. I privileged lines that were the beginnings of sentences over lines that were the middle of sentences. I eliminated lines that were boring (e.g. composed entirely of super-common English words).

I also did some research into what distinguished funny, popular @horse_ebooks tweets from tweets that were not funny and less popular. Instead of trying to precisely reverse-engineer an algorithm that had a human at one end, I tried to figure out which outputs of the process gave results people liked, and focused my algorithm on delivering more of those. I'll post my findings in a separate post because this is getting way too long. Suffice to say that I'll pit the output of my program against the curated @horse_ebooks feed any day. Such as today, and every day for the next sixty years.

Like its counterpart in our universe, @pony_strategies doesn't just post quotes: it also posts ads for ebooks. Some of these books are strategy guides for the "Pôneis Brilhantes" series described in Constellation Games, but the others have randomly generated titles. Funny story: they're generated using Markov chains! Yes, when you have a corpus of really generic-sounding stuff and you want to make fun of how generic it sounds by generating more generic-sounding stuff, Markov chains give the best result. But do you really want to have that on your resume, Markov chains? "Successfully posed as unimaginative writer." Way to go, man.

Anyway, @pony_strategies. It's funny quotes, it's fake ads, it's an algorithm you can use in your own projects. Use it!

[Comments] (2) Secrets of (peoples' responses to) @horse_ebooks—revealed!: As part of my @pony_strategies project (see previous post), I grabbed the 3200 most recent @horse_ebooks tweets via the Twitter API, and ran them through some simple analysis scripts to figure out how they were made and which linguistic features separated the popular ones from the unpopular.

This let me prove one of my hypotheses about the secret to _ebooks style comedy gold. I also disproved one of my hypotheses re: comedy gold, and came up with an improved hypotheses that works much better. Using these as heuristics I was able to make @pony_strategies come up with more of what humans consider the good stuff.


The timing of @horse_ebooks posts formed a normal distribution with mean of 3 hours and a standard deviation of 1 hour. Looking at ads alone, the situation was similar: a normal distribution with mean of 15 hours and standard deviation of 2 hours. This is pretty impressive consistency since Jacob Bakkila says he was posting @horse_ebooks tweets by hand. (No wonder he wanted to stop it!)

My setup is much different: I wrote a cheap scheduler that approximates a normal distribution and runs every fifteen minutes to see if it's time to post something.

Beyond this point, my analysis excludes the ads and focuses exclusively on the quotes. Nobody actually liked the ads.


The median length of a @horse_ebooks quote is 50 characters. Quotes shorter than the median were significantly more popular, but very long quotes were also more popular than quotes in the middle of the distribution.


I think that title case quotes (e.g. "Demand Furniture") are funnier than others. Does the public agree? For each quote, I checked whether the last word of the quote was capitalized.

43% of @horse_ebooks quotes end with a capitalized word. The median number of retweets for those quotes was 310, versus 235 for quotes with an uncapitalized last word. The public agrees with me. Title-case tweets are a little less common, but significantly more popular.

The punchword

Since the last word of a joke is the most important, I decided to take a more detailed look each quote's last word. My favorite @horse_ebooks tweets are the ones that cut off in the middle of a sentence, so I anticipated that I would see a lot of quotes that ended with boring words like "the".

I applied part-of-speech tagging to the last word of each quote and grouped them together. Nouns were the most common by far, followed by verb of various kinds, determiners ("the", "this", "neither"), adjectives and adverbs.

I then sorted the list of parts of speech by the median number of retweets a @horse_ebooks quote got if it ended with that part of speech. Nouns and verbs were not only the most common, they were the most popular. (Median retweets for any kind of noun was over 300; verbs ranged from 191 retweets to 295, depending on the tense of the verb.) Adjectives underperformed relative to their frequency, except for comparative adjectives like "more", which overperformed.

I was right in thinking that quotes ending with a determiner or other boring word were very common, but they were also incredibly unpopular. The most popular among these were quotes that repeated gibberish over and over, e.g. "ORONGLY DGAGREE DISAGREE NO G G NO G G G G G G NO G G NEIEHER AGREE NOR DGAGREE O O O no O O no O O no O O no neither neither neither". A quote like "of events get you the" did very poorly. (By late-era @horse_ebooks standards, anyway.)

It's funny when you interrupt a noun

I pondered the mystery of the unpopular quotes and came up with a new hypothesis. People don't like interrupted sentences per se; they like interrupted noun phrases. Specifically, they like it when a noun phrase is truncated to a normal noun. Here are a few @horse_ebooks quotes that were extremely popular:

Clearly "computer", "science", "house", "and "meal" were originally modifying some other noun, but when the sentence was truncated they became standalone nouns. Therefore, humor.

How can I test my hypothesis without access to the original texts from which @horse_ebooks takes its quotes? I don't have any automatic way to distinguish a truncated noun phrase from an ordinary noun. But I can see how many of the @horse_ebooks quotes end with a complete noun phrase. Then I can compare how well a quote does if it ends with a noun phrase, versus a noun that's not part of a noun phrase.

About 4.5% of the total @horse_ebooks quotes end in complete noun phrases. This is comparable to what I saw in the data I generated for @pony_strategies. I compared the popularity of quotes that ended in complete noun phrases, versus quotes that ended in standalone nouns.

Quote ends in Median number of retweets
Standalone noun 330
Noun phrase 260
Other 216

So a standalone noun does better than a noun phrase, which does better than a non-noun. This confirms my hypothesis that truncating a noun phrase makes a quote funnier when the truncated phrase is also a noun. But a quote that ends in a complete noun phrase will still be more popular than one that ends with anything other than a noun.


At the time I did this research, I had about 2.5 million potential quotes taken from the Project Gutenberg DVD. I was looking for ways to rank these quotes and whittle them down to, say, the top ten percent. I used the techniques that I mentioned in my previous post for this, but I also used quote length, capitalization, and punchword part-of-speech to rank the quotes. I also looked for quotes that ended in complete noun phrases, and if truncating the noun phrase left me with a noun, most of the time I would go ahead and truncate the phrase. (For variety's sake, I didn't do this all the time.)

This stuff is currently not in olipy; I ran my filters and raters on the much smaller dataset I'd acquired from the DVD. There's no reason why these things couldn't go into olipy as part of the ebooks.py module, but it's going to be a while. I shouldn't be making bots at all; I have to finish Situation Normal.

[Comments] (3) Markov vs. Queneau: Sentence Assembly Smackdown: I mentioned earlier that when assembling strings of words, Markov chains do a better job than Queneau assembly. In this post I'd like to a) give the devil his due by showing what I mean, and b) qualify what I mean by "better job".

Markov wins when the structure is complex

I got the original idea for this post when generating the fake ads for @pony_strategies. My corpus is the titles of about 50,000 spammy-sounding ebooks, and this was the first time I did a head-to-head Markov/Queneau comparison. Here are ten of Markov's entries, using the Markov chain implementation I ended up adding to olipy:

  1. At Gas Pump!
  2. The Guy's Guide To The Atkins Diet
  3. Home Internet Business In The World.
  4. 101 Ways to Sharpen Your Memory
  5. SEO Relationship Building for Beginners
  6. Gary Secrets - Project Management Made Easy!
  7. Weight Success
  8. How get HER - Even If It's Just Money, So Easy and Effective Treatment Options
  9. Sams Yourself
  10. Define, With, Defeat! How To Get Traffic To Your Health

The Markov entries can get a little wacky ("Define, With, Defeat!"), which is good. But about half could be real titles without seeming weird at all, which is also good.

By contrast, here are ten of Queneau's entries:

  1. Adsense I Collection Profits: The bottom Guide Income!
  2. Reliable Your Earning Estate Develop Home And to life Fly Using Don't Your Partnership to Death
  3. Help the Your Causes, Successfully Business Vegetarian
  4. Connect New New Cooking
  5. 1 Tips, Me Life Starting to Simple Ultimate On Wills How Years Online With Living
  6. How Practice Health Best w/ Beauty
  7. Amazing Future & Codes Astrology to Definitive Green Carbs, Children Methods JV Engine Dollars And Effective Beginning Minutes NEW!
  8. I and - Gems Secrets Making Life Today!
  9. Succeeding For Inspiring Life
  10. Fast Survival Baby (Health Loss) Really How other of Look Symptoms, Your Business Encouragement: drive Health to Get with Easy Guide

At their very best ("Suceeding For Inspiring Life, "How Practice Health Best w/ Beauty"), these read like the work of a non-native English speaker. But most of them are way out there. They make no sense at all or they sound like a space alien wrote them to deal with space alien concerns. Sometimes this is what you want in your generated text! But usually not.

A Queneau assembler assumes that every string in its corpus has different tokens that follow an identical grammar. This isn't really true for spammy ebook titles, and it certainly isn't true for English sentences in general. A sentence is made up of words, sure, but there's nothing special about the fourth word in a sentence, the way there is about the fourth line of a limerick.

A Markov chain assumes nothing about higher-level grammar. Instead, it assumes that surprises are rare, that the last few tokens are a good predictor of the next token. This is true for English sentences, and it's especially true for spammy ebook titles.

Markov chains don't need to bother with the overall structure of a sentence. They focus on the transitions between words, which can be modelled probabilistically. (And the good ones do treat the first and last tokens specially.)

Markov wins when the corpus is large, Queneau when the corpus is tiny

Consider what happens to the two algorithms as the corpus grows in size. Markov chains get more believable, because the second word in a title is almost always a word commonly associated with the first word in the title. Queneau assemblies get wackier, because the second word in a title can be anything that was the second word in any title.

I have a corpus of 50,000 spammy titles. What if I chose a random sample of ten titles, and used those ten titles to construct a new title via Queneau assembly? This would make it more likely that the title's structure would hint at the structure of one or two of the source titles.

This is what I did in Board Game Dadaist, one of my first Queneau experiments. I pick a small number of board games and generate everything from that limited subset, increasing the odds that the result will make some kind of twisted sense.

If you run a Markov chain on a very small corpus, you'll probably just reproduce one of your input strings. But Queneau assembly works fine on a tiny corpus. I ran Queneau assembly ten times on ten samples from the spammy ebook titles, and here are the results:

  1. Beekeeping by Keep Grants
  2. Lose to Audience Business to to Your Backlink Physicists Environment
  3. HOT of Recruit Internet Because Financial the Memories
  4. Senior Guide Way! Business Way!
  5. Discover Can Power Successful Life How Steps
  6. Metal Lazy, Advice
  7. Insiders Came Warts Weapons Revealed
  8. 101 Secrets & THE Joint Health Than of Using Marketing! Using Using More Imagine
  9. Top **How Own 101**
  10. Multiple Spiritual Dynamite to Body - To Days

These are still really wacky, but they're better than when Queneau was choosing from 50,000 titles each time. For the @pony_strategies project, I still prefer the Markov chains.

Queneau wins when the outputs are short

Let's put spammy ebook titles to the side and move on to board game titles, a field where I think Queneau assembly is the clear winner. My corpus is here about 65,000 board game titles, gathered from BoardGameGeek. The key to what you're about to see is that the median length of a board game title is three words, versus nine words for a spammy ebook title.

Here are some of Markov's board game titles:

  1. Pointe Hoc
  2. Thieves the Pacific
  3. Illuminati Set 3
  4. Amazing Trivia Game
  5. Mini Game
  6. Meet Presidents
  7. Regatta: Game that the Government Played
  8. King the Rock
  9. Round 3-D Stand Up Game
  10. Cat Mice or Holes and Traps

A lot of these sound like real board games, but that's no longer a good thing. These are generic and boring. There are no surprises because the whole premise of Markov chains is that surprises are rare.

Here's Queneau:

  1. The Gravitas
  2. Risk: Tiles
  3. SESSION Pigs
  4. Yengo Edition Deadly Mat
  5. Ubongo: Fulda-Spiel
  6. Shantu Game Weltwunder Right
  7. Black Polsce Stars: Nostrum
  8. Peanut Basketball
  9. The Tactics: Reh
  10. Velvet Dos Centauri

Most of these are great! Board game names need to be catchy, so you want surprises. And short strings have highly ambiguous grammar anyway, so you don't get the "written by an alien" effect.


You know that I've been down on Markov chains for years, and you also know why: they rely on, and magnify, the predictability of their input. Markov chains turn creative prose into duckspeak. Whereas Queneau assembly simulates (or at least stimulates) creativity by manufacturing absurd juxtapositions.

The downside of Queneau is that if you can't model the underlying structure with code, the juxtapositions tend to be too absurd to use. And it's really difficult to model natural-language prose with code.

So here's my three-step meta-algorithm for deciding what to do with a corpus:

  1. If the items in your corpus follow a simple structure, code up that structure and go with Queneau.
  2. If the structure is too complex to be represented by a simple program (probably because it involves natural-language grammar), and you really need the output to be grammatical, go with Markov.
  3. Otherwise, write up a crude approximation of the complex structure, and go with Queueau.



Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.