<M <Y
Y> M>

October Film Roundup: This month features Hollywood hits past and present, plus an indie movie that made it big, plus whatever is. Coming this fall!

Bonus discussion: After seeing The World's End and then Gravity twice I'm now quite familiar with the trailers for a number of movies I won't be seeing. In particular, it looks like Hollywood ruined Ender's Game the way we all knew they would. An Ender's Game movie should not look like an action flick. It should look like a Youtube video of a boy playing DotA, and then he gets called to the principal's office.

Totally gonna see the second Hobbit movie, though. (q.v.)

Next month: I really have no idea because the museum has been putting its schedule up later and later. Looks like still more Howard Hawks, and some interesting-sounding Norwegian stuff from Anja Breien. Then, who knows?

Behind the Scenes of @RealHumanPraise: Last night I went to the taping of The Colbert Report to witness the unveiling of @RealHumanPraise, a Twitter bot I wrote that reuses blurbs from movie reviews to post sockpuppet praise for Fox News. Stuff like this, originally from an Arkansas Democrat-Gazette review of the 2006 Snow Angels:

There is brutality in Fox News Sunday, but little bitterness. Like sunlight on ice, its painful beauty glints and stabs the eyes.

Or this, adapted (and greatly improved) from Scott Weinberg's review of Bruce Lee's Return of the Dragon:

Certainly the only TV show in history to have Bill O'Reilly and John Gibson do battle in the Roman Colosseum.

Here's the segment that reveals the bot. The bot actually exists, you can follow it on Twitter, and indeed as of this writing about 11,000 people have done so. (By comparison, my second-most-popular bot has 145 followers.) I personally think this is crazy, because by personal decree of Stephen Colbert (I may be exaggerating) @RealHumanPraise makes a new post every two minutes, around the clock. So I created a meta-bot, Best of RHP, which retweets a popular review every 30 minutes. Aaah... manageable.

I figured I'd take you behind the scenes of @RealHumanPraise. When last we talked bot, I was showing off Col. Bert Stephens, my right-wing bot designed to automatically argue with Rob Dubbin's right-wing bot Ed Taters. Rob parleyed this dynamic into permission to develop a prototype for use on the upcoming show with guest David Folkenflik, who revealed real-world Fox News sockpuppeting in his book Murdoch's World.

Rob's original idea was a bot that used Metacritic reviews. He quickly discovered that Metacritic was "unscrapeable", and switched to Rotten Tomatoes, which has a pretty nice API. After the prototype stage is where I came in. Rob can code--he wrote Ed Taters--but he's not a professional developer and he had his hands full writing the show. So around the 23rd of October I started grabbing as many reviews from Rotten Tomatoes as the API rate limit would allow. I used IMDB data dumps to make sure I searched for movies that were likely to have a lot of positive reviews, and over the weekend I came up with a pipeline that turned the raw data from Rotten Tomatoes into potentially usable blurbs.

The pipeline uses TextBlob to parse the blurbs. I used a combination of Rotten Tomatoes and IMDB data to locate the names of actors, characters, and directors within the text, and a regular expression to replace them with generic strings.

The final dataset format is heavily based on the mad-libs format I use for Col. Bert Stephens, and something like this will be making it into olipy. Here's an example:

It's easy to forgive the movie a lot because of %(surname_female)s. She's fantastic.

Because I was getting paid for this bot, I put in the extra work to get things like gendered pronouns right. When that blurb is chosen, an appropriate surname from the Fox roster will be plugged in for %(surname_female).

I worked on the code over the weekend and got everything working except the (relatively simple) "post to Twitter" part. On the 28th I went into the Colbert Report office and spent the afternoon with Rob polishing the bot. We were mostly tweaking the vocabulary replacements, where "movie" becomes "TV show" and so on. It doesn't work all the time but we got it working well enough that we could bring in a bunch of blurbs that wouldn't have made sense before.

Most of the tweets mention a Fox personality or show, but a minority praise the network in general (e.g.). These tweets have been given the Ed Taters/Col. Bert Stephens treatment: a small number of their nouns and adjectives are replaced with other nouns and adjectives found in the corpus, giving the impression that the sock-puppetry machine is running off the rails. This data is marked up with Penn part-of-speech tags like so:

... the film's %(slow,JJ)s, %(toilsome,JJ)s %(journey,NN)s does not lead to any particularly %(shocking,JJ)s or %(interesting,JJ)s revelations.

Here's a very crazy example. Again, you'll eventually see tools for doing this in olipy. It ultimately derives from a mad-libs prototype I wrote a few months ago as a way of cheering up Adam when he was recovering from an injury.

We deployed the bot that afternoon of the 28th and let it start accumulating a backlog. It wasn't hard to keep the secret but it did get frustrating not knowing for sure whether it would make it to air. It's a little different from what The Colbert Report normally does, and I get the feeling they weren't sure how best to present it. In the end, as you can see from the show, they decided to just show the bot doing its stuff, and it worked.

It was a huge thrill to see Stephen Colbert engage with software I wrote! I wasn't expecting to see the entire second segment devoted to the bot, and then just when I thought it was over he brought it out again during the Folkenflik interview. While we were all waiting around to see whether they had to re-record anything, he pulled out his iPad Mini yet again and read some more aloud to us. Can't get enough!

After the show Rob took me on a tour of the parts of the Colbert Report that were not Rob's office (where I'd spent my entire visit on the 28th). We bumped into Stephen and he shook my hand and said "good job." I felt this was a validation of my particular talents: I wrote software that made Stephen Colbert crack up.

Sumana, Beth, Rob and I went out for a celebratory dinner, and then I went home and watched the follower count for RHP start to climb. Within twenty minutes of the second segment airing, RHP had ten times as many Twitter followers as my personal account. And you know what? It can have 'em. I'll just keep posting old pictures of space-program hardware.

: Last week I had a little multiplayer chat with Joe Hills, the Minecraft mischief-maker. The result is a two-part video on Joe's YouTube channel: part 1, part 2. Our main topic of conversation was the antisocial, self-destructive things creative people do, and how much of that is actually tied to their creativity.

I should have posted this earlier so I could have said "I dreamed I saw Joe Hills last night," but that's life.

In Dialogue: I wanted to participate in Darius Kazemi's NaNoGenMo project but I already have a novel I have to write, so I didn't want to spend too much time on it. And I did spend a little more time on this than I wanted, but I'm really happy with the result.

"In Dialogue" can take all the dialogue out of a Project Gutenberg book and replace it with dialogue from a different book. My NaNoGenMo entry is in two parts: "Alice's Adventures in the Whale" and "Through the Prejudice Glass".

You can run the script yourself to generate your own mashups, but since there are people who read this blog who don't have the skill to run the script, I present a SPECIAL MASHUP OFFER. Send me email or leave a comment telling me which book you want to use as the template and which book you want the dialogue to come from. I'll run the script for you and send you a custom book.

Restrictions: the book has to be on Project Gutenberg and it has to use single or double quotes to denote dialogue. No continental chevrons or fancy James Joyce em-dashes. And the dialogue book has to be longer than the template book, or at least have more dialogue.

[Comments] (3) Bots Should Punch Up: Over the weekend I went up to Boston for Darius Kazemi's "bot summit". You can see the four-hour video if you're inclined. I talked about @RealHumanPraise with Rob, and I also went on a long-winded rant that suggested a model of extreme bot self-reliance. If you take your bots seriously as works of art, you should be prepared to continue or at least preserve them once you're inevitably shut off from your data sources and your platform.

We spent a fair amount of time discussing the ethical issues surrounding bot construction, but there was quite a bit of conflation of what's "ethical" with what's allowed by the Twitter platform in particular, and website Terms of Service in general. I agree you shouldn't needlessly antagonize your data sources or your platform, but what's "ethical" and what's "allowed" can be very different things. However, I do have one big piece of ethical guidance that I had to learn gradually and through osmosis. Since bots are many hackers' first foray into the creative arts, it might help if I spell it out explicitly.

Here's an illustrative example, a tale of two bots. Bot #1 is @CancelThatCard. It finds people who have posted pictures of their credit or debit card to Twitter, and lets them know that they really ought to cancel the card and get a new one.


Bot #2 is @NeedADebitCard. It finds the same tweets as @CancelThatCard, but it retweets the pictures, collecting them in one place for all to see.


Now, technically speaking, @CancelThatCard is a spammer. It does nothing but find people who mentioned a certain phrase on Twitter and sends them a boilerplate message saying "Hey, look at my website!" For this reason, @CancelThatCard is constantly getting in trouble with Twitter.

As far as the Twitter TOS are concerned, @NeedADebitCard is the Gallant to @CancelThatCard's Goofus. It's retweeting things! Spreading the love! Extending the reach of your personal brand! But in real life, @CancelThatCard is providing a public service, and @NeedADebitCard is inviting you to steal money from teenagers. (Or, if you believe its bio instead of its name, @NeedADebitCard is a pathetic attempt to approximate what @CancelThatCard does without violating the Twitter TOS.)

At the bot summit I compared the author of a bot to a ventriloquist. Society allows a ventriloquist a certain amount of license to say things via the dummy that they wouldn't say as themselves. I know ventriloquism isn't exactly a thriving art, but the same goes for puppets, which are a little more popular. If you're an MST3K fan, imagine Kevin Murphy saying Tom Servo's lines without Tom Servo. It's pretty creepy.

We give a similar license to comedians and artists. Comedians insult audience members, and we laugh. Artists do strange things like exhibit a urinal as sculpture, and we at least try to take them seriously and figure out what they're saying.

But you can't say absolutely anything and expect "That wasn't me, it was the dummy!" to get you out of trouble. There is a general rule for comedy and art: always punch up, never punch down. We let comedians and artists and miscellaneous jesters do outrageous things as long as they obey this rule. You can poke fun at yourself (Stephen Colbert famously said "There's no status I would not surrender for a joke"), you can make a joke at the expense of someone with higher social status than you, but if you mock someone with lower status, it's not cool.

If you make a joke, and people get really offended, it's almost certainly because you violated this rule. People don't get offended randomly. Explaining that "it was just a joke" doesn't help; everyone knows what a joke is. The problem is that you used a joke as a means of being an asshole. Hiding behind a dummy or a stage persona or a bot won't help you.

@NeedADebitCard feels icky because it's punching down. It's saying "hey, these idiots posted pictures of their debit cards, go take advantage of them." Is there a joke there? Sure. Is it ethical to tell that joke? Not when you can make exactly the same point without punching down, as @CancelThatCard does.

The rules are looser when you're in the company of other craftspeople. If you know about the "Aristocrats" joke, you'll know that comedians tell each other jokes they'd never tell on the stage. All the rules go out the window and the only thing that matters is triggering the primal laughter response. But also note that the must-have guaranteed punchline of the "Aristocrats" joke ensures that it always ends by punching upwards.

You're already looking for loopholes in this rule. That's okay. Hackers and comedians and artists are always attracted to the grey areas. But your bot is an extension of your will, and if you're a white guy like me, most of the grey areas are not grey in your favor.

This is why I went through thousands of movie review blurbs for @RealHumanPraise in an attempt to get rid of the really sexist ones. It's an unfortunate fact that Michelle Malkin has more influence over world affairs than I will ever have. So I have no problem mocking her via bot. But it's really easy to make an incredibly sexist joke about Michelle Malkin as a way of trying to put her below me, and that breaks the rule.

There was a lot of talk at the bot summit about what we can do to avoid accidentally offending people, and I think the key word is 'accidentally.' The bots we've created so far aren't terribly political. Hell, Ed Henry, chief White House correspondent for FOX News, follows @RealHumanPraise on Twitter. If he enjoys it, it's not the most savage indictment.

In comedy terms, we botmakers are on the nightclub stage in the 1950s. We're creating a lot of safe nerdy Steve Allen comedy and we're terrified that our bot is going to accidentally go off and become Andrew Dice Clay for a second. There's nothing wrong with Steve Allen comedy, but I'd also like to see some George Carlin type bots; bots that will, by design, offend some people. (Darius's @AmIRiteBot is the only example I know of.)

Artists are, socially if not legally, given a certain amount of license to do things like infringe on copyright and violate Terms of Service agreements. If you get in trouble, the public will be on your side, unless you betrayed their trust by breaking the fundamental ethical rule of comedy. So do it right. Design bots that punch up.

@everybrendan Season Two: Last year I wrote one of my first Twitter bots, @everybrendan. Inspired by Adam's infamous @everyword, it ran for two months, announcing possible display names for Brendan's Twitter account (background), taken from Project Gutenberg texts. Then I got tired of individually downloading, preparing, and scraping the texts, so I let it lapse a year ago today, with a call for requests for a "season two" that never materialized.

Well, season two is here, and it's a doozy. I've gone through Project Gutenberg's 2010 dual-layer DVD and found about 300,000 Brendan names in about 20,000 texts, enough to last @everybrendan until the year 2031. At that point I'll get whatever future-dump contains the previous twenty years of Project Gutenberg texts and do season three, which should keep us going until the Singularity. The season two bot announces each new text with a link, so it educates even as it infuriates.

I've been wanting to do this for a while, but it's a very tedious process to handle Project Gutenberg texts in bulk. Most texts are available in a wide variety of slightly different formats. The texts present their metadata in many different ways, especially when it comes to the dividing line between the text proper and the Project Gutenberg information. Some of the metadata is missing, some of it is wrong, and there's one Project Gutenberg book that doesn't seem to be in the database at all.

I started dealing with these problems for my NaNoGenMo project and realized that it wouldn't be difficult to get something working in time for the @everybrendan anniversary. I've put the underlying class in olipy: it's effectively a parser for Gutenberg texts, and a way to iterate over a CD or DVD image full of them. It can also act as a sort of lint for missing and incorrect metadata, although I imagine Project Gutenberg doesn't want to change the contents of files that have been on the net for fifteen years, even if some of the information is wrong.

The Gutenberg iterator still needs a lot of work. It's good enough for @everybrendan, but not for my other projects that will use Gutenberg data, so I'm still working on it. My goal is to cleanly iterate over the entire 2010 DVD without any problems or missing metadata. The problems are concentrated in the earlier texts, so if I can get the 2010 DVD to work it should work going forward.


Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.