< November Film Roundup
Secrets of (peoples' responses to) @horse_ebooks—revealed! >

[No comments] @pony_strategies: My new bot, @pony_strategies, is the most sophisticated one I've ever created. It is the @horse_ebooks spambot from the Constellation Games universe.

Unlike @horse_ebooks, @pony_strategies will not abruptly stop publishing fun stuff, or turn out to be a cheesy tie-in trying to get you interested in some other project. It is a cheesy tie-in to some other project (Constellation Games), but you go into the relationship knowing this fact, and the connection is very subtle.

When explaining this project to people as I worked on it, I was astounded that many of them didn't know what @horse_ebooks was. But that just proves I inhabit a bubble in which fakey software has outsized significance. So a brief introduction:

@horse_ebooks was a spambot created by a Russian named Alexei Kouznetsov. It posted Twitter ads for crappy ebooks, some of which (but not all, or even most) were about horses. Its major innovative feature was its text generation algorithm for the things it would say between ads.

Are you ready? The amazing algorithm was this: @horse_ebooks ripped strings more or less randomly from the crappy ebooks it was selling and presented them with absolutely no context.

Trust me, this is groundbreaking. I'm sure this technique had been tried before, but @horse_ebooks was the first to make it popular. And it's great! Truncating a sentence in the right place generates some pretty funny stuff. Here are four consecutive @horse_ebooks tweets:

There was a tribute comic and everything.

I say @horse_ebooks "was" a spambot because in 2011 the Twitter account was acquired by two Americans, Jacob Bakkila and Thomas Bender, who took it over and started running it not to sell crappy ebooks, but to promote their Alternate Reality Game. This fact was revealed back in September 2013, and once the men behind the mask were revealed, @horse_ebooks stopped posting.

The whole conceit of @horse_ebooks was that there was no active creative process, just a dumb algorithm. But in reality Bakkila was "impersonating" the original algorithm—most likely curating its output so that you only saw the good stuff. No one likes to be played for a sucker, and when the true purpose of @horse_ebooks was revealed, folks felt betrayed.

As it happens, the question of whether it's artistically valid to curate the output of an algorithm is a major bone of contention in the ongoing Vorticism/Futurism-esque feud between Adam Parrish and myself. He is dead set against it; I think it makes sense if you are using an algorithm as the input into another creative process, or if your sole object is to entertain. We both agree that it's a little sketchy if you have 200,000 fans whose fandom is predicated on the belief that they're reading the raw output of an algorithm. On the other hand, if you follow an ebook spammer on Twitter, you get up with fleas. I think that's how the saying goes.

In any event, the fan comics ceased when @horse_ebooks did. There was a lot of chin-stroking and art-denial and in general the reaction was strongly negative. But that's not the end of the story.

You see, the death of @horse_ebooks led to an outpouring of imitation *_ebooks bots on various topics. (This had been happening before, actually.) As these bots were announced, I swore silent vengeance on each and every one of them. Why? Because those bots didn't use the awesome @horse_ebooks algorithm! Most of them used Markov chains, that most hated technique, to generate their text. It was as if the @horse_ebooks algorithm itself had been discredited by the revelation that two guys from New York were manually curating its output. (Confused reports that those guys had "written" the @horse_ebooks tweets didn't help matters--they implied that there was no algorithm at all and that the text was original.)

But there was hope. A single bot escaped my pronouncements of vengeance: Adam's excellent @zzt_ebooks. That is a great bot which you should follow, and it uses an approximation of the real @horse_ebooks algorithm:

  1. The corpus is word-wrapped at 35 characters per line.
  2. Pick a line to use as the first part of a tweet.
  3. If (random), append the next line onto the current line.
  4. Repeat until (random) is false or the line is as large as a tweet can get.

And here are four consecutive quotes from @zzt_ebooks:

Works great.

The ultimate genesis of @pony_strategies was this conversation I had with Adam about @zzt_ebooks. Recently my anger with *_ebooks bots reached the point where I decided to add a real *_ebooks algorithm to olipy to encourage people to use it. Of course I'd need a demo bot to show off the algorithm...

The @pony_strategies bot has sixty years worth of content loaded into it. I extracted the content from the same Project Gutenberg DVD I used to revive @everybrendan. There's a lot more where that came from--I ended up choosing about 0.0001% of the possibilities found in the DVD.

I have not manually curated the PG quotes and I have no idea what the bot is about to post. But the dataset is the result of a lot of algorithmic curation. I focused on technical books, science books and cookbooks--the closest PG equivalents to the crap that @horse_ebooks was selling. I applied a language filter to get rid of old-timey racial slurs. I privileged lines that were the beginnings of sentences over lines that were the middle of sentences. I eliminated lines that were boring (e.g. composed entirely of super-common English words).

I also did some research into what distinguished funny, popular @horse_ebooks tweets from tweets that were not funny and less popular. Instead of trying to precisely reverse-engineer an algorithm that had a human at one end, I tried to figure out which outputs of the process gave results people liked, and focused my algorithm on delivering more of those. I'll post my findings in a separate post because this is getting way too long. Suffice to say that I'll pit the output of my program against the curated @horse_ebooks feed any day. Such as today, and every day for the next sixty years.

Like its counterpart in our universe, @pony_strategies doesn't just post quotes: it also posts ads for ebooks. Some of these books are strategy guides for the "PĂ´neis Brilhantes" series described in Constellation Games, but the others have randomly generated titles. Funny story: they're generated using Markov chains! Yes, when you have a corpus of really generic-sounding stuff and you want to make fun of how generic it sounds by generating more generic-sounding stuff, Markov chains give the best result. But do you really want to have that on your resume, Markov chains? "Successfully posed as unimaginative writer." Way to go, man.

Anyway, @pony_strategies. It's funny quotes, it's fake ads, it's an algorithm you can use in your own projects. Use it!

Filed under:


Post a comment

Your name:

Your home page:

Remember this information

Comments:

Allowed HTML tags: <a>, <b>, <i>. Blank lines become paragraph separators.


[Main] [Edit]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.