[ About | Features | Music | Pictures | Software | Writing ]
Crummy: The Site
We fight 'em until we can't.

News You Can Bruise

Buy my books!

3 years ago: Month of Kickstarter 2012 #4: Devoted

4 years ago: Month of Kickstarter: Patriotism Edition

6 years ago: Well Now I'm Pushing Thirty

8 years ago: A sonnet (about sonnets!) written in Inform 7.

[Comments] (1) June Film Roundup:

Tragically, this marks the end of Film Roundup, as the resolution I foolishly made late in the month means that the only movies I can see from this point on are the likes of Hocus Pocus (1993), Heaven's Prisoners (1996), Hurt Penguins (1992), and the Tagalog comedy classic Haba-baba-doo! Puti-puti-poo! (1997). We'll miss the magic, the mystery, but most of all... the movies.

Wait, I can just disregard resolutions? They're not legally binding? Amazing! See you next month! I gotta go cancel my Columbia Record Club membership.

[Comments] (1) Beautiful Soup 4.4.0 beta: I've found an agent for Situation Normal and the book is out to publishers and I don't have to think about it for a while. As seems to be my tradition after finishing a big project, I went through the accumulated Beautiful Soup backlog and closed it out. I've put out a beta release which I'd like you to try out and report any problems.

I've fixed 17 bugs, added some minor new features, and changed the implementations of __copy__ and __repr__ to work more like you'd expect from Python objects. But in my mind the major new change is this: I've added a warning that displays when you create a BeautifulSoup object without explicitly specifying a parser:

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. To get rid of this warning, change this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "lxml")

It's a little annoying to get this message, but it's also annoying to have your code silently behave differently because you copied it to a machine that didn't have lxml installed, and it's also annoying when I have to check pretty much every reported bug to see whether this is the problem. Whenever I think I can eliminate a class of support question with a warning, I put in the warning. It saves everybody time.

The other possibility: now that Python's built-in HTMLParser is decent, I could make it so that it's always the default unless you specify another parser. This would cause a big one-time wrench, as even machines which have lxml installed would start using HTMLParser, but once it shook out the problem would be solved. I might still do that, but I think I'll give everyone about a year to get rid of this annoying warning.

Anyway, try out the beta. Unless there's a big problem I'll be releasing 4.4.0 on Friday.

[Comments] (1) Reviews of Old Science Fiction Magazines: Analog 1985/07: Here it is, the final entry in this series, started seven years ago when I picked up a bunch of old SF magazines at a swap-fest. I've acquired a lot of magazines since then, and those are getting 'old', so it could continue, but this is the last of the original set. And good riddance, because this magazine smells like laundry detergent for some reason.

So what do we got? The cover story (one assumes) is the first part of Timothy Zahn's "Spinneret", which would later be published as a novel. It was good but I kinda see where it's going and don't feel a strong need to read the novel.

Eric G. Iverson's "Noninterference" is a pleasant story whose sole purpose is to dis the Prime Directive. The accompanying artwork seems more appropriate to a story about the mixing of the ultimate prog-rock album.

Charles L. Harness's "George Washington Slept Here" is the cream of this issue: a creative, funny and entertaining story that combines several Analog favorites (aliens, historical figures, and fussy middle-aged hobbies) that you rarely see together. Bonus: no time travel or major alt-history, just a character with a really long lifespan. I really liked the concept of the main character, a lawyer who loses every case he takes, but in a way that's more beneficial to his client than if he'd won. That concept's strong enough to support a series, but it looks like this is the only one.

This month's vague story blurbs:

Now to nonfiction. David Brin's essay "Just How Dangerous Is The Galaxy?" classifies every known potential solution to the Fermi Paradox and puts them in a big table by which term of the Drake Equation they affect. He also introduces his own "Water World" solution, which he deigns to classify in a separate section called "Optimism". This solution posits that "Earth is unusually dry for a water world," and that intelligent life evolves all the time, and thrives for long periods, but very rarely builds spaceships. I'm just riffing on the idea here, and I don't buy the idea that "hands and fire" are prerequisites to advanced technology, but you could imagine a dolphin-type civilization treating a planet's surface and atmosphere the way we treat low-earth orbit.

Tom Easton's book review column includes a review of Ender's Game, which wanders into a long philosophical discussion that I won't reproduce here because it's pretty similar to stuff you can find on the Internet. I was disappointed to read that "Russel M. Griffin's The Timeservers is a pale incarnation of the diplomatic satire that made Laumer's Retief so popular." It was a Phillip K. Dick Award finalist, though, so maybe it's just on a different wavelength from Laumer.

In letters, paleontologist Jack Cohen returns fire at Tom Easton, who in an earlier book review column disputed the evolutionary biology in Harry Harrison's Cohen-collaboration West of Eden. And reader Michael Owens has it out with Ben Bova about the latter's support of the Star Wars program. Summary of Owens: "far from leading to a defense-oriented world, Star Wars leads to another offense-oriented arms race." Bova responds that he wrote a book (Assured Survival) that deals with all this stuff, and then mentions this comforting tidbit:

[T]he new defensive technologies do not apply only to satellites and ballistic missiles. They are already being developed into "smart weapons" that will make the tanks, artillery, planes, and ships of conventional land and sea warfare little more than expensive and very vulnerable targets. "Star Wars" technologies (plural!) can make all forms of aggressive warfare so difficult that an era of worldwide peace is in view—if the nations of the world want peace.

Which leads nicely into the thing I've saved for last because I've got a lot to say about it, in direct violation of my usual "if you can't say anything nice" rule. Previously on Analog, columnist G. Harry Stine asked readers to send in their answers to the following question, which I will quote in full:

What, in your opinion, is the most important problem that technologists should tackle in the next twenty years, and why do you believe this?

In this issue Stine reports the results, and I was looking forward to doing a kind of The Future: A Retrospective thing on them.

The first thing Stine does is disqualify 120 of the 127 replies he got. That may seem extreme, but that's approximately what I'd do if I was running a magazine and accepting fiction submissions. I was kind of laughing along as he disqualified entries for exceeding the word limit or otherwise ignoring the rules, but then I got to this:

49.61% of the replies [63 of 127]... discussed problems that were either (a) not technological problems, but social and political instead; (b) already solved or well along the road to solution; (c) trivial and parochial in their scope; (d) based on incorrect, incomplete, or outmoded data; and/or (e) the result of someone else's telling the respondent that the problem was a problem because the expert said so, whereupon the respondent stated it on faith without checking.

And at this point I gotta call bullshit. You didn't say "most important technological problem", you said "most important problem technologists should tackle." Social and political problems have technical aspects, and vice versa. The impact of a technological development is judged by its effect on society. This is the basis of the science fiction genre! You could replace every vague Analog story blurb with "Social and political problems tend to have technical aspects, and vice versa...", and it would always fit the story!

Half of Analog's readership can follow directions but their opinions are wrong. Let's take a look at the top five disqualified "problems" (all direct quotes, scare quotes in original):

  1. Control of nuclear weapons
  2. the "population explosion"
  3. the "energy shortage"
  4. the "raw materials shortage"
  5. "pollution" in various and sundry forms

I sure am glad technologists didn't waste any more time on these non-problems after 1985! According to Stine, America's ballistic missile defense system is well on its way to solving #1 (if the nations of the world want peace, of course). #2 isn't a problem anymore because the rate of population growth has slowed. #3 and #4 were never real problems. ("The only reason we had an 'energy shortage' was to provide an excuse for politicians and bureaucrats to gain control of natural resources, and thereby gain control over people.") As for #5, who's to say what counts as "pollution"? Like most words, it's a "semantically-loaded term". "Pollution in its many forms may be a localized problem in some areas, but it is not a worldwide problem."

So what are the seven entries that made the cut? I'm glad you asked, previous sentence:

  1. "Making products maintenance-free, i.e. designed for a 100-year life with a 0.0001 probability of maintenance." DISQUALIFIED. Maybe the move from 75 years to 100 would be a technical improvement, but the problem as it exists today is a problem with the way products are sold, and technical improvements won't change that.
  2. "[C]ontrol of the weather" to boost crop yields and prevent famine. SEMI-DISQUALIFIED. Modern famines are political problems, not technical problems. Control of the weather would indeed be great, not for this reason, but because it would let us mitigate the damage caused by our worldwide pollution problem.
  3. "The construction and maintenance of closed ecological systems". Sure, OK.
  4. Here's the shortest quote I could get that explains this one:
    Education depends on communication. John points out that communication involves moving information from place to place... which really isn't much of a problem, but... managing the information is. It's possible to download lots of information into a student's mind. But if the student doesn't know how to determine what information is meaningful and relevant... everything stored in the student's memory is useless.

    Now that's more like it! Not only is this a real problem, it's one that we made significant progress on between 1985 and 2005!

  5. "The development of the direct link between the human mind and the computer to produce a true intelligence amplifier." Another good one. We got both parts of this (mind-computer link and intelligence amplifier), but in practice they don't have anything to do with each other.
  6. "[T]he construction by machines of very small machines." This also happened but proved not to be a huge deal, and even Stine is kinda skeptical ("he doesn't specify exactly what technological problems can be solved by developing sub-microscopic technology"). I'm gonna go out on a limb and say the real problem is the reader doesn't specify exactly what social or political problems can be solved with this technology.
  7. And finally,
    Del Cain of Augusta, ME presented a technological problem that is as much philosophical as technological... He wants technologists to develop structures and artifacts that tend to support healthy behavior in human beings—i.e. to help people live and rear children so they can develop to their full potential without trauma but not without struggle, difficulty, or drama. To do this, he believes that we should solve the technological problem of determining what are the optimum sizes and structures of healthy communities. In short, he feels that the big problem is developing technology with a life-affirming philosophy behind it.

    I don't understand how Del Cain managed to smuggle the concept of Scandanavian social democracy past G. Harry Stine, but good job. No, wait, I figured it out: I'm projecting, and so was he.

Well, there we go, that's our look at old SF magazines of the 80s. To commemorate the end of the series, I've scanned all the old ads in this magazine, not just the ones I thought were interesting or funny. But here are the ones I thought were interesting or funny:

I'll leave you with this question: what, in your opinion, is the most important problem that technologists should have tackled from 1985 to 2005, and why do you believe this?

May Film Roundup: This month features some interesting foreign films, an old-favorite blockbuster, and an awesome new blockbuster with a surprising connection to one of my all-time favorite films. What are these nuggets of cinema gold? I don't know, I'm just the intro paragraph, you'll have to ask the bulleted list:

[Comments] (3) The Future Is Prologue: I'm experimenting with writing a prologue for Situation Normal, to reduce the thrown-into-the-deep-end feeling typical of my fiction. I say 'experimenting with' rather than 'just doing it' because I wrote something and it wasn't a prologue. I'd just turned back the clock to before the book started and written a regular scene.

I don't like prologues for the very reason I'm trying to write one: they're introductory infodumps. I usually skim them, unless they look like the Law and Order style prologues where the POV character dies at the end of the scene. But this book has so many POV characters already, I don't think I should go that route.

I talked it over with Sumana and she gave me the idea of pacing the prologue as though it were the first scene of a short story. That's something I've done before, so I know I can do it again, and it doesn't mean big infodumps, just more internal monologue.

I'd like your suggestions of genre fiction books with effective prologues. Prologues that made you say "yes, I want to read a whole book about this stuff." I can't think of many examples but I admit I'm blinded by prejudice.

April Film Roundup: Sumana spent a lot of time out of town this month, so I took the opportunity to clear out a bunch of items on my "movies I want to see but Sumana doesn't" list. But there's also plenty of movies we saw together. How can you tell the difference?... I think you'll be able to tell.

March Film Roundup: We saw lots of stuff this month but not a lot of feature films. The upside is that a lot of what I did see is online for free.

January Film roundup:

Reviews of Old Science Fiction Magazines: F&SF October 1985: The first story in this magazine is James Tiptree's "The Only Neat Thing to Do", and the introductory copy introduces the main character as "a green-eyed young woman who happens to be one of the most appealing characters you are likely to encounter in these or any other pages," and my attitude was "Pffft, green eyes, sure, we'll see about that... DAMMIT." This story's so good. It starts out with this perfect wish-fulfillment space adventure but look at the title, folks, it's not gonna end well. Argh, so good.

Harlan Ellison still hates Gremlins, in fact he says he's been getting letters from people who scoffed at his Gremlins hate but now they've seen the movie they're swallowing their pride and sending him "toe-scuffling, red-faced, abnegating appeals for absolution." I'm harboring a doubt or two here, because he's also saying other people who took his advice (and presumably didn't see the movie) are thanking him. Given that Gremlins has consistently been a well-regarded film since its release, why would someone say "Thanks for warning me off the movie I haven't seen that people still seem to like."?

But all that's in the past. In this issue Ellison doubles down, telling people not to see The Goonies due to "utter emptyheadedness", which, okay, at least it's a critique and not 'the lurkers support me in email.' Also on Ellison's shit list for this month: Rambo: First Blood Part II, A View to a Kill, and The Black Cauldron. He loves Cocoon, Ladyhawke, and Return to Oz, and who's to say he's wrong? Not me, 'cause I haven't seen any of those movies.

There's some really corny back-cover copy in one of the ads for books, but I know from experience that writing back-cover copy is the worst, so as a professional courtesy I'm not going to make fun of it. Kind of weird that most of the stories in this issue are SF or horror, but all the ads are for fantasy books.

Halley's Comet fever strikes the classifieds! There's an ad for Halley's Comet, 1910: Fire in the Sky, sort of a historical recreation by Jerred Metz. Also a "HALLEY'S COMET. TIE TAC or Stick Pin. Four color enamel and beautiful." I'm hyping up the Halley's Comet thing because I happen to own a mint in-box Halley's Comet Hot Wheels car the likes of which are currently going on eBay for a measly $5.32 used including shipping. C'mon! This is my nest egg here! I demand... demand!

Minecraft Archive Project: 201502 Capture: I've done a new capture of data for the Minecraft Archive Project, my big 2014 project to archive the early history of Minecraft before it disappeared. My goal for the refresh was to capture what has happened in the past year while doing as little work as possible, and I met my goal. The whole thing took about two weeks, and most of that was a matter of letting things run overnight. Most of the actual work was refactoring the code I wrote the first time to make future captures even easier.

Top-line numbers: I've archived another 150 gigabytes of good stuff, including 18k maps and schematics, 1k mods, 11k skins, 7k texture packs (resource packs now, I guess), and 100k screenshots. I was able to archive about 73% of the maps. Four percent of them maps were just gone, and 23% I didn't know how to download.

The 201404 Minecraft Archive Project capture contains data from four sites. The new 201502 capture is limited to two sites: the official Minecraft forum and the huge Planet Minecraft site. I started archiving maps, mods, and textures for Minecraft Pocket Edition, and was able to pick up about 5500 MCPE maps.

Now that I've done this twice without getting into trouble, I'll give a little more detail about the process. I've got scripts that download the archives of the Minecraft forum and Planet Minecraft. I find all the threads/projects modified since the last capture, download the corresponding detail pages (e.g. the first page of a forum thread--I'm only after the original post), and extract all the links.

Then it's a matter of archiving as many of those links as possible. I've written recipes for archiving images and downloads. These six recipes take care of the vast majority of items:

There's also a general catch-all for people who host things on normal home pages, as Tim Berners-Lee intended. If your URL looks like the URL to an image or a binary archive, I will ask for that URL. If you serve me the image or the binary instead of an HTML file telling me to click on something, then I'll archive the file.

I decode most link shorteners except for the ones that make you click through ads, mainly adfoc.us and adf.ly. The 2014 archive had about 18,000 maps behind adf.ly links, and I spent a lot of time running Selenium clients clicking through the ads to discover the Mediafire links. I think that took a month. This time there were about 3000 new maps behind adf.ly links and I just didn't bother.

There are two big blind spots in my dataset, and they're the same as last time. One is mods. A lot of mods are hosted on Github and CurseForge, two big sites I didn't write recipes for. There's also the issue of mod packs, which have been steadily growing in popularity and complexity as development on core Minecraft winds down. Thanks to things like the Hardcore Questing Mod, modpacks are entering the "custom challenge" territory previously occupied solely by world archives.

There are sites that list mod packs (1 2) but I don't want to spend the time figuring out how to archive all the mod packs. There's also the problem that mod packs are huge.

The second blind spot is servers. It's theoretically possible to join a public Minecraft server with a modded client and automatically archive the map, but realistically it ain't gonna happen. I complained about this last time, but now I've done an assessment of what's being lost.

Planet Minecraft has a big server list that mentions the last time it was able to ping any particular server. There doesn't seem to be any purging of dead servers, so I'm able to get good measurements of the typical lifecycle.

Of the 136k servers in the list, 12k are "online" (The most recent Planet Minecraft ping was successful). 51k are "offline" (Most recent Planet Minecraft ping failed, but there was a successful ping less than two weeks ago) and 73k I declare "dead" (last successful ping was more than two weeks ago). It seems really weird that of the nearly half of the 'offline' servers went offline in the past two weeks, so something's going on there; maybe Planet Minecraft's ping process is unreliable, or it just takes a long time to check every server, or servers go up and down all the time.

Anyway, the median lifetime for a public Minecraft server is 434 days, a little over a year. These things go online, people do a bunch of work on them, and then they disappear. I've kind of gotten to 'acceptance' on this, but it's still obnoxious.

One final thing: I thought I'd check if I could see the result of Mojang's June announcement of rules for how you can make money by hosting servers (and, more importantly, how you can't). I wanted to see if these rules had a chilling effect on the formation of new servers or caused a lot of old servers to shut down.

And... no, not really. Here's a chart showing two sixty-day periods around June 12, the date of the Mojang blog post. For each day I show 'births' (the number of servers first seen on that day) and 'deaths' (the number of servers last seen on that day). There's a drop-off in new servers around the end of July, but then it picks up again stronger than before. I don't have an explanation for it but I don't think there's anything in here you can pin on a blog post. The Mojang rules were probably intended to go after a small number of large obnoxious servers, and everyone else either doesn't care or flies under the radar.

(Screenshot is from World #57 by Art_Fox. I didn't archive the map because it's behind an adf.ly link, but I got the screenshot.)

PS: Congratulations to Anticraft, the oldest public Minecraft server I could find that's still online, added to Planet Minecraft on February 28, 2011.

Update: I fixed up the adf.ly code and let it run for another two weeks (!), saving another 2000 Minecraft maps and 700 MCPE maps. I probably won't do this again because it's a huge pain, but I said that this time and ended up doing it out of some sense of obligation to the future, so maybe obligation will strike again, who knows.

Poems of SCIENCE! I Mean, Science: I picked up a cheap old poetry anthology called Poems of Science, figuring there'd be some good stuff. And... there was, but I had wait for the modern conception of "science" to come about, and then spot poetry about a hundred years to come to grips with it, and decide that science is interesting and not going to go away. By that time I was more than halfway through the anthology. But around the late nineteenth century some excellent poetry starts happening, and I thought I'd share a couple links.

Miroslav Holub's Zito the Magician and Robert Browning's much longer An Epistle Containing the Strange Medical Experience of Karshish, the Arab Physician are really great and work as spec-fic stories. Swinburne's Hertha is this weird humanist we-are-made-of-star-stuff mythology that's what you'd expect from Swinburne. And then there's "Cosmic Gall", a goofy poem by John Updike which I'm gonna quote in full because it's the only thing of John Updike's I've read and liked.

Cosmic Gall
John Updike

Neutrinos, they are very small.
They have no charge and have no mass
And do not interact at all.
The earth is just a silly ball
To them, through which they simply pass,
Like dustmaids down a drafty hall
Or photons through a sheet of glass.
They snub the most exquisite gas,
Ignore the most substantial wall,
Cold shoulder steel and sounding brass,
Insult the stallion in his stall,
And, scorning barriers of class,
Infiltrate you and me. Like tall
And painless guillotines they fall
Down through our heads into the grass.
At night, they enter at Nepal
And pierce the lover and his lass
From underneath the bed—you call
It wonderful; I call it crass.

The Ghost of Ghostbusters Past: Just a quick semi-technical post on how I made @WeBustedGhosts, my new bot that casts movies from an alternate history where "ghostbusters" is a stock comedy genre, sort of a twentieth-century commedia dell'arte. In particular, I did a lot of work with IMDB data that I want to record for your benefit (and by you, I mean future me).

The bot was inspired by two things: first, this video by Ivan Guerrero which "premakes" Ghostbusters as a 1954 comedy starring Bob Hope, Fred MacMurray, and Martin/Lewis. Second, the reaction of fools to the fact that women comedians will bust ghosts in the upcoming Ghostbusters remake. More specifically, Kris's endless mockery of the idea that "ghostbuster" is a job with a legitimate gender qualification.

These things got me thinking about the minimal set of things you need to make Ghostbusters. You need the idea of combining a horror movie with a comedy about starting a business. Someone could have come up with that idea in the silent film era. You need a director and four actors who can do comedy. And all those people need to be alive and working at the same time, because ghosts aren't real... OR ARE THEY? Either way, you can describe a point in Ghostbusters space with six pieces of information: four actors, a director, and a year. That's small enough to fit into a tweet, so I made a Twitter bot.

Our journey to botdom starts, as you might expect, with an IMDB data dump. I've dealt with IMDB data before and this time I was excited to learn about IMDbPY, which promised to get a handle on the ancient and not-terribly-consistent flat-file IMDB data format. Unfortunately IMDbPY is designed for looking up facts about specific movies, not for reasoning over the set of all movies. However, it does have a great script called imdbpy2sql.py, which will take the flat-file format and turn it into a SQL database.

There will be SQL in this discussion (because I want to show you/future me how to do semi-complex stuff with the database created by IMDbPY), but unless you're future me, you can skip it. Basically, for each actor in IMDB, I need to calculate that actor's tendency to get high billing in popular comedies for a given year. They don't have to be good comedies, or Ghostbusters-like comedies, they just have to have a lot of IMDB ratings.

I also want to figure out each actor's effective comedy lifespan. If an actor stops doing popular comedy or dies or retires, they should stop showing up in the dataset. If a dramatic actor branches out into comedy they should show up in the dataset as of their first comedic performance. Basically, if you learned that this actor starred in a comedy that came out in a certain year, it shouldn't be a big surprise.

Orson Wells would be great in a Ghostbusters movie, but he never did comedy, so he's not in the dataset. How about... Cameron Diaz? She rarely gets top billing, but she has second or third billing in a lot of very popular comedies. For a year like 1997 she tops the list of potential women Ghostbusters.

How about... Peter Falk? His first comedy role was in 1961's Pocket Full of Miracles, his last in 2005's Checking Out. His acting career stretches from 1957 to 2009, but he's only a potential Ghostbuster between 1961 and 2005. He won't get chosen very often, because he's not primarily known for comedy (i.e. his comedies aren't as popular as other peoples'), but it will happen occasionally.

That's the data I extracted. Not "how famous is this actor" but "how much would you expect this actor to be in a comedy in a given year".

The IMDbPY database is more complicated than I like to deal with, so my strategy was to use SQL get a big table of roles and then process it with Python. Here's SQL to get every major role in a comedy that has more than 1000 votes on IMDB:

select title.title, title.production_year, movie_info_idx.info, name.name, name.gender, cast_info.nr_order, kind_id from title join cast_info on title.id=cast_info.movie_id join name on cast_info.person_id=name.id join movie_info_idx on movie_info_idx.movie_id=title.id join movie_info on movie_info.movie_id=title.id where cast_info.role_id in (1,2) and kind_id in (1,3,4) and movie_info.info_type_id=3 and movie_info.info='Comedy' and cast(movie_info_idx.info as integer) > 1000 and movie_info_idx.info_type_id=100 and cast_info.nr_order <= 7;

Some explanation of numbers and IDs:

I run this on a SQLite database and the output looks like:

#1 Cheerleader Camp|2010|2297|Cassell, Seth|m|2|4

So the title of the movie is "#1 Cheerleader Camp", it came out in 2010, it has 2297 votes, and Seth Cassell (a man) was an actor in that movie and got fourth billing.

Why didn't I include television in this query? Because television on IMDB is really complicated. See, actors aren't credited to television shows; they're credited to individual episodes. But nobody rates individual episodes; they rate the show as a whole. So I had to do a separate query to determine who the top actors were on each comedy television show, and then divide up that show's votes between the four top actors. Otherwise actors whose primary comedy career is in television won't get their due.

Here's SQL to get all the roles in TV episodes:

select tv_show.title, episode.title, episode.production_year, votes.info, name.name, name.gender, cast_info.nr_order from title as tv_show join title as episode on tv_show.id=episode.episode_of_id join cast_info on episode.id=cast_info.movie_id join name on cast_info.person_id=name.id join movie_info_idx as votes on votes.movie_id=tv_show.id join movie_info on movie_info.movie_id=tv_show.id where cast_info.role_id in (1,2) and tv_show.kind_id in (2,5) and episode.kind_id=7 and movie_info.info_type_id=3 and movie_info.info='Comedy' and cast(votes.info as integer) > 10000 and votes.info_type_id=100 and cast_info.nr_order < 5;

This is pretty similar to the last query but some of the IDs are different.

I run this and the output looks like:

'Allo 'Allo!|A Bun in the Oven|1991|14022|Kaye, Gorden|m|1

This means there's an 'Allo 'Allo! episode called "A Bun in the Oven", the episode came out in 1991, 'Allo 'Allo (NOT this specific episode) has 14,022 votes, and Gorden Kaye got top billing for this episode.

I got this data out of a database as quickly as possible and bashed at it to make a TV show look like a movie with four actors--the four actors who appeared in the most episodes of the TV show.

Directors were pretty similar to film actors. for each director who's ever worked in comedy, I measured their tendency towards putting out a popular comedy in any given year. There's a very strong power law here, with a few modern directors overshadowing their contemporaries, and Charlie Chaplin completely obliterating all his contemporaries.

Here's SQL to get all comedies with their directors:

select title.title, title.production_year, movie_info_idx.info, name.name, name.gender from title join cast_info on title.id=cast_info.movie_id join name on cast_info.person_id=name.id join movie_info_idx on movie_info_idx.movie_id=title.id join movie_info on movie_info.movie_id=title.id where cast_info.role_id in (8) and kind_id in (1,3,4) and movie_info.info_type_id=3 and movie_info.info='Comedy' and cast(movie_info_idx.info as integer) > 5000 and movie_info_idx.info_type_id=100;

The only new number here is cast_info.role_id in (8), which means I'm now picking up directors instead of actors.

At this point I was done with the SQL database. I wrote the "Ghostbusters casting office". It chooses a year, picks a cast and a director for that year, and then (15% of the time) it picks a custom title. My stupidly hilarious technique for custom titles is to choose the name of an actual comedy from the given year and replace one of the nouns with "Ghost" or "Ghostbuster". So far this has led to films like "Don't Drink the Ghost" and (I swear this happened during testing) "Ghostbuster Dad".

Here's how I pick a cast for a given year: I line up all the actors for that year by my calculated variable "tendency towards being a Ghostbuster", and then I use random.expovariate to choose from different places near the front of the list (to bias the output towards actors you won't have to look up). This is the same trick I use for Serial Entrepreneur to choose common (but not too common) adjectives and nouns for its inventions. My means are 0.85, 0.8, 0.75, and 0.7, which will, on average, give me someone who's at the 85th percentile, someone at the 80th percentile, 75th percentile and 70th percentile.

This is the best I could do to recreate the dynamic of 1984 Ghostbusters where Bill Murray and Dan Aykroyd were very well-known actors even before Ghostbusters, where Ernie Hudson and Harold Ramis were not. At this point you might object that Ernie Hudson and Harold Ramis weren't even 75th or 70th percentile. Ghostbusters was Ramis's second movie ever as an actor; I think there was an oral history that said he gave himself the part of Egon Spengler because no one else was a big enough dork. So for pure accuracy I should be doing, like, 0.90/0.85/0.35/0.30. But that gives you way too many obscure actors and the output isn't as fun. It also doesn't feel accurate, because 1984 Ghostbusters was a real movie, and all by itself it made Hudson and Ramis pretty famous actors. So now we expect "Ghostbuster" to be sort of a prestige comedy role.

A more valid point is that 0.8/0.8/0.75/0.7 also doesn't really capture the dynamic of the 2016 Ghostbusters, where all four actors are well-known but Kristen Wiig has twice the credits of the other three. So I also created an 0.85/0.8/0.8/0.75 mode, which will tend to give you more big-name ensembles.

As always, there's a lot of behind-the-scenes data munging. Going from a bunch of "xth billing in movie with y votes" entries to a single "tendency towards being a Ghostbuster" number required a lot of semi-arbitrary decisions, and I think my algorithm still undercounts television actors. Whenever there was a power law, I smoothed it out a little to increase the variety of the output. I smoothed out the overrepresentation of post-IMDB comedies compared to pre-IMDB comedies; of superstar directors like Chaplin who overshadow everyone else in their time; and of men directors vastly outnumbering women.

Representation of women comedic actors vs. men was not an issue because I followed the lead of the Ghostbusters remake. 45% of the ghostbusting teams are all women, and 45% are all men. (10% of teamups are coed, just to add variety.) There's no code that makes sure all the actors speak the same language or anything like that—I could extract that data from IMDB but it would be a lot of work to make the output of the bot less interesting.

And there you go. It's not source code, but you should be able to see more or less how I took this bot from concept to execution, and how I negotiated the tricky space between "this is an accurate representation of what would happen in an alternate universe where the primary cinematic comedy genre is films about busting ghosts" and "this is a fun output for this bot to have."

January Film Roundup: January started with three highly anticipated films that all turned out to be duds! What to do for the rest of the month, but stack the deck?

More Dice Fun: A while back I wrote about a maddening but interesting book called Scarne on Dice. It's a really huge book which I intend to get rid of ASAP, but before I do there's a couple things about dice, and cheating at dice, I wanted to quote.

In perhaps the most entertaining section of the book Scarne takes on the sleaziest parties in this whole wretched business, "the crooked gambling supply houses", who sell outdated cheating devices at huge markups. According to The Big Con: The Story of the Confidence Man, another book I read recently, the mailing lists of these supply houses were coveted by con artists, because by definition, everyone on those lists "liked the best of it." One catalog's advice to buyers, according to HoyleScarne:

When telegraphing use the following code: PAINT for cards and CUBE for dice.


This head-slapping entry from Scarne's inventory of trick dice needs to be quoted in full:


These are a very brazen brand of mis-spotted dice that show 7 or 11 every roll. Since the catalog lists them, there apparently are buyers, but they are strictly for use on very soft marks and then only on dark nights. One die bears only the numbers 6 and 2; the other nothing but 5's! Since anyone but a blind man would tag these cubes as mis-spots, the moment they rolled out, they are of no use except for night play under an overhead light when the chumps can't see anything but the top surfaces of the dice. Strictly for use by cheats who don't know what a real set of Tops is.

There's a a couple entertaining but long stories of specific cheats which I won't transcribe. The best is the story of "the mouth switch". Seems there was a craps hustler in the 30s who kept a trick die in his mouth and introduced into the game it by cupping the dice in his hands and "blowing" on them. They called him "Mononucleosis Joe". Actually they called him "The Spitter," but they only started calling him that after he tried this trick while drunk and ended up rolling all three dice onto the craps table.

Finally, a tale of collegiality which I feel gets really boring if you explain what the numbers mean:

Several years ago the Harvard Computation Laboratory put a battery of calculating machines to work and came up with a whole book full of answers. Since the binomial formula is used in many problems and so often requires staggering amounts of arithmetic, they constructed a set of Cumulative Binomial Probability Distribution Tables which give provability fractions for a wide range of values of n, r, and P. And because Dr. Frederick Mosteller, Chairman of the Department of Statistics, had seen a copy of Scarne on Dice and was aware of the 26 game problem, he saw to it that the calculating machienes were asked to provide figures for the terms n = 130 and P = 1/6.

It's easy to read this book and feel superior to the people who get fooled by seemingly rudimentary tricks (David Maurer, author of The Big Con, specifically points this out in his book), but I'm sure someone who knew their stuff could take my entire roll in a crooked dice game. Why am I so sure? Because you could take my entire roll in a completely fair dice game.

The Crummy.com Review of Things 2014: Another year, another blog post summing it up. Here's 2013. And here's 2014:


2014's big project was The Minecraft Archive project, which led into The Minecraft Geologic Survey, which led into the Reef series and two huge bots. I'm planning on doing a refresh of the data this year to get maps created in 2014--hopefully it'll be easier the second time.

I also finished Situation Normal, edited it and have now sent it out to editors and agents. I'm cautiously optimistic. I finished two short stories, "The Process Repeats" and "The Barrel of Yuks Rule", and like many of my stories they're a rewrite away from being sellable and who knows when I'll get the time.

I gave a talk on bots at Foolscap and a talk on improving Project Gutenberg metadata at Books in Browsers. That ties into my job at NYPL. I had a full-time job for most of this year, for the first time in a while, and 2015 is the year you'll get to use what I'm making.

Subcategory: Bots. You won't believe how many autonomous agents I created in 2014! I'm not even going to show you all of them, only the ones I'm really proud of. I'm going to order them by how much I like them, but I'll also include their current Twitter follower count--the only measurement that really matters in this post-apocalyptic world.

My secret goal for 2014 was to have a bot whose follower count was greater than my own. Minecraft Signs (probably my favorite bot of all time) came close but didn't quite make it.

I also created a bot that's so annoying I didn't release it. Maybe this year.


I scaled back my film watching versus 2013, but still saw about fifty features. Here's my 2014 must-watch list. As always, only films I saw for the first time are eligible for this prestigeless nonor.

  1. Pom Poko (1994)
  2. Alien (1979)
  3. My Love Has Been Burning (1949)
  4. Seven Chances (1925)
  5. A Town Called Panic (2009)
  6. Alphaville (1965)
  7. Frozen (2013)
  8. The King of Comedy (1982)
  9. Playtime (1967)
  10. The Women (1939)

These are more or less the films I would watch again (a very high bar to clear), although The King of Comedy should be watched once and only once. I'm kind of surprised that Playtime got on here since I wasn't wild about it, but I really can see how it'll be better the second time.

The runners-up: films I recommend, but will probably not see again, and if you're like "aah, it's three hours long" or "aah, David Bowie alien penis", I'll understand:

  1. Solaris (1972)
  2. The Man Who Fell to Earth (1976)
  3. Guardians of the Galaxy (2014)
  4. Queen Christina (1933)
  5. Paprika (2006)


Didn't read a lot of books this year, but I made them count. The Crummy.com Books of the Year are Dispatches, Michael Herr's Vietnam reporting memoir, and Phil Lapsley's phone-phreak history Exploding the Phone, which covers about the same time period. Both awesome.

Sumana and I selected Kim Stanley Robinson's "The Lucky Strike" for a Strange Horizons reprint. It's a great story.


Since I started commuting again it was a decent year for discovering new podcasts. Sumana and I love Just One More Thing, a deep-dive Columbo podcast. I also really like Omega Tau, a podcast that will do a two-part series on shipping container logistics, or a five-parter on the hardware and operation of the space shuttle. Honorable mention to the guilty pleasure-ish Laser Time, which is more or less random nostalgia but which brings out a lot of interesting deep cuts.


Didn't play a lot of video games because a) Minecraft Archive Project took up all the time I used to spend playing Minecraft, and b) my desktop developed a weird problem where it abruptly powers off if I stress it too much, e.g. by playing a modern computer game. I should really address this problem, but I have not, because it does prevent me from spending too much time on games.

Played a lot of board games with friends as usual. The Crummy.com Board Game of the Year is 2014's The Castles of Mad King Ludwig, a building game that captures the true thrill of interior design. Runner-up: Hanabi, the cooperative game that magically turns passive-agressiveness into an asset that benefits all. Dishonorable mention to 1989's Sniglets, a party game where having fun requires not that you disregard the scoring system (a common thing for party games) but that you deliberately play to lose.

That reminds me, I should have mentioned in 2014's review of 2013 that Encore is a party game from 1989 that's really, really good. You have to have the right group though.

That's it! How we doin' in 2015? I'm getting a lot done. In fact I just wrote this big blog post talking about the best of 2014... oh, but you're probably not interested. See ya!

[Comments] (2) December Film Roundup: And in this corner... Film Roundup!

Conceptual Crossovers: Over my Christmas break I read Glen David Gold's Sunnyside, a great novel, a little literary for my taste, but really solid. It's one of those novels with interweaving plots, and Charlie Chaplin is the main character of one of the strands. Around the middle of the book, Chaplin is at a rally in San Francisco, giving a speech trying to sell government bonds for WWI. His rhetoric starts to falter, and we go from a transcript of the speech into a pretty believable interior monologue:

He felt abandoned. He hated the war. He hated that the country was in it, that there was no place to go but forward, that more atrocities were to come. He felt people were never intentionally beastly or malicious, but they were pompous and foolish; awful decisions were made by men divorced from their own humanity. He thought that universal peace was within reach if only people ceased to be stupid.

When he had pretended to be Trotsky, he had spoken well. But now that he was trying to be both himself and a servant of the world, he was failing. He persevered, believing that the simple act of faith, the spirit of talking with the audience, would lead to a kind of communion.

I thought this was really amazing because Gold's monologue, juxtaposed with his Chaplin character is doing at this point, explained the ending of The Great Dictator for me. Why real-life Chaplin is willing to turn the intense climax of his scathing film into a soppy train wreck: that's how he thinks he can actually make a difference. This is the only time you will listen. I still don't like the ending, but I have some sympathy in my Grinch-scale heart for the decision.

Over the break I also experienced a literature/film epiphany in the opposite direction. In my Constellation Games author commentary I say that I "reuse some of the character of Ariel from The Tempest, the guy with magic powers who gets bossed around all the time." But after rewatching The Little Mermaid I gotta admit that Ariel is also named after the girl who's so obsessed with an alien culture that she fills a cave with their incomprehensible stuff.

[Comments] (2) 2014 Scrapbook, Part 2: That Belongs In A Museum: Welcome back, let's check out some cool stuff I can't afford.


In March, before starting my job at NYPL, I took a trip to Providence to hang out with Jake (still an awesome guy after nearly twenty years of friendship). Jake introduced me to the Retro-Computing Society of Rhode Island, who have an amazing museum. I say "museum", it's just one big room, and it looks like this:

I believe that all museums have a room that looks like this; it's just that at RCSRI that room is coextant with the display portion of the museum.

RCSRI has an open house once a month, but we got a private tour because Jake is a close personal friend of the proprietor.

The said proprietor, seen holding a Singer paper tape.
Just one of the incredible sights.
Good advice.
The front of a specialized tablet peripheral for CAD (?), about four feet square.
I can DIAL-A-VUP from the briny deep.

I took several detailed photos of the famous "space cadet" keyboard for the Symbolics LISP machine, because although this computer is famous in hacker lore, at the time there were no good close-ups online. (I dunno about now. Well, there are now, because I'm putting these up, but as I'm writing this draft, I don't know.)

Note the four directional buttons with thumbs-up and thumbs-down.

Los Angeles

Museum of my youth, the Los Angeles Museum of Natural History.
Dino kids.


Along with my uncle Leonard I visited the Worshipful Company of Clockmakers, who have a museum of clockmaking in the back of the London Guild Hall. The Guild Hall is still an active government building, so make sure you go all the way round the back for the museum, though I'm not sure why I'm even giving this advice because apparently the Clockmakers' Museum has all been packed up to be moved to the Science Museum. Anyway, I'm really glad I got to see this little museum because it was full of tons of amazing old clocks (many of which still run), and equipment for building and repairing them.

Like this toolchest.

Another new favorite: the Tring tiles from the British Museum. Two-panel comic strips show Jesus as a little kid getting into trouble. "Left: A boy playfully leaps onto Jesus's back and then falls dead. Right: Two women complain to Joseph... while Jesus restores the boy to life."

And the parents don't take this lying down! On another tile, "Parents shut their children in an oven, to prevent them playing with Jesus." A well-thought-out plan.

New York

From the Sidewalk Museum of Discarded Art, a picture of the New York skyline made of Cheetos.

The Met had a fabulous exhibit with a lot of Xu Bing. I got my chance to get some good photos of An Introduction to Square Word Calligraphy, a set of rules for writing English words like they're Chinese characters.

"Rain, rain, go away"
The alphabet.

And of course there was his masterpiece of eaten meaning, Book From The Sky.

Man, I wish this had been the inspiration for Smooth Unicode instead of Allison's thing. Bring some class to my bots for once.

I also saw these assembly instructions for an Alexander Calder mobile.

Do not lose!

And Paul Klee's Carcasonne set.


Finally, on a trip to Portland I indulged in some Mondrian candy.

Liquid Velocity 3 by Jun Kaneko

[Comments] (1) 2014 Scrapbook, Part 1: As I'm sure you've noticed, the direction of NYCB has trended away in recent years from me journaling and putting up photos of everything I do shortly after I do it. With so many large companies encouraging millions of people to do this and mining the data to create more annoying ads, it doesn't seem as fun. Call me contrarian!

But before the end of the year I wanted to sort of catch you up with a little scrapbook of some of the good times from 2014. This is mostly tourist and family stuff; I've kept all the cool museum finds for a separate post.

West Coast

I took a brief trip to the Bay Area, where I sorted out a ton of stuff in Kevin Maples's garage that we left with him when we moved from San Francisco in 2005.

Like Russian Ricky Martin gum from 2001.
Or Sumana's grade-school poster on Lee Iacocca.

Sumana and I met up in Seattle for the Foolscap conference, where I was a guest of honor along with Brooks Peck. It was my first con and I had a great time! Thanks to Ron Hale-Evans for inviting me.

The official Guest of Honor portrait, or at least a photo from that session.
One of the installation pieces I ran at the conference, in its conference-hotel context.


In a pretty amazing development I got an email from Doug, a fan of Constellation Games who keeps a private plane at a New Jersey airport. We hung out one Saturday and he and his wife took me and Sumana on a flight up the Hudson River.

Really fun random experience. Thanks, Doug.


In the summer I brought my acquired-on-the-cheap tuxedo to London for Rachel's wedding.

Wedding party.
Rachel's speech.
While in London I visited the incredible Monument to Heroic Self-Sacrifice

We went with the kids to Warwick Castle in... Warwickshire.

Celebrating its 1000th anniversary!
Castle Warwick was a huge tourist trap.
But it had great views...
...and siege machinery.
No trip is complete without a visit to The Butts.

The Holidays

We got some nice snow for Thanksgiving.
I made a TON of pie.
Christmas was the polar opposite--shirtsleeves at Disneyland.
I don't know if Susanna realized that this picture made her family look like we were all about to pull off a heist.
We made a TON of cookies.

The Year In Stone Lions


The Year In Reusable Orbiters

Susanna and kids underneath Endeavor.
The Intrepid's on-deck building holding the Enterprise, seen during the plane flight.

The Year In Signage

All of these are from the UK, because foreign signs are just funnier.

I thought the contrast between old and new style was really striking.
Oh no! Disappointment!
Possibly the worst sign in the world.

[Comments] (2) November Film Roundup: 2014's penultimate roundup! Here it is.

Over Thanksgiving I also saw a bunch of Phineas and Ferb with kids, and it's a fun kids' show, but not gonna review it. Okay, fine: it's a fun kids' show. The characterization is pretty bad and based on stereotypes but the multi-layered plots are very clever. There's your review.

[Comments] (1) Public Service Film Roundup:

...And Maps: I've got some exciting new stuff for people who read NYCB but not my Twitter feed (which, if you consider the future, is the vast majority of everyone who reads NYCB). As I mentioned in the film roundup, I went to the Books in Browsers conference with my NYPL colleague James English. James gave an overview of the Library Simplified project we work on, and then I gave a talk I like to call (and did call) "Project Gutenberg Books are Real Books!".

Part of my work on Library Simplified is to integrate Project Gutenberg books into our ebook catalog. This sounds easy, and it is, so long as you're willing to treat Gutenberg books as second-class citizens that live in their own poorly-documented area. I'm trying to do something more like what Amazon did with its free Kindle books (BTW I recently discovered that they're selling the newer ones)—turn the Gutenberg texts into no-frills derivative editions that are nonetheless fully integrated into the storefront.

Second, there's a new Reef map, Reef #4: The Timeline, a cross-section of Minecraft history going from late 2010 to mid-2014. I think it's the most accessible of the Reef maps—it's small and it's obvious what's going on.

As is tradition, I introduced Reef #4 with a video, in which I compelled Lapis Lauri and Ron Smalec to race to the end of the Timeline for my own amusement (and theirs).

As you can tell I'm working on all kinds of stuff, notably something you will probably never see—the pitch document for Situation Normal. I really hate writing this stuff and it's a huge pain, but why write a book if you're not going to try to sell it?

October Film Roundup: Pretty slim pickings this month. (Damn, shoulda used that line back in April after I saw 1941. Oh well, no one will even know—wait, am I typing this? Computer, end program.)

It may appear that I wouldn't have seen any movies in October were it not for my trip to San Francisco. What you don't know is that by taking the trip to San Francisco I missed out on a weekend of cool old horror movies at the museum. So it was probably two movies either way.

Reviews of Old Science Fiction Magazines: F&SF October/November 1991: I bet you thought this Crummy mini-feature was dead! That's because it was! When I started making pro sales I decided it wasn't a good idea to be constantly badmouthing my colleagues and the venues I was trying to sell to. So I stopped posting reviews. But a while ago Sumana and I were asked to pick a story to reprint in Strange Horizons, and I really had no idea, because these reviews are the only records I have of which short stories I've read. (We ended up choosing Kim Stanley Robinson's "The Lucky Strike".) And then I took this 1991 magazine on my most recent plane trip and pretty much everything in the magazine was fun. So I thought I'd mention some of the fun and keep a record for posterity.

There will still be some badmouthing, notably of the ad at the beginning of the magazine for a dorky "sexpunk" book. It's a two-page spread that includes some quotes from the stories, two of which are dramatizations of urban legends. Then it shows you the book's I-missed-the-80s cover, and then it brings on the hard sell: "Eleven Short Stories, Two Novelettes, One Novella—256 pages on acid-free paper." I gotta say, I was on the fence until I heard the book was printed on acid-free paper! I'll paste my scrapbook photos into it!

OK, on to the positivity! Carolyn Ives Gilman's "The Honeycrafters" is a Nebula award nominee-to-be that works its one basic idea from all angles and captures the thrill of Minecraft's Forestry mod. A super, super fun read. Bradley Denton has a great Breaking Bad-esque story in "Rerun Roy, Donna, and the Freak", complete with drugs cooked in an RV. Jane Yolen's "Dear Ms. Lonelylegs" is silly and only four pages long.

There's a weird subplot in the book review columns (one by Algis Budrys, one by Orson Scott Card) about how books that come out in paperback first are considered second-class citizens of the book world. Books that come out in hardcover first and then paperback are the upper-crust of 1990s science fiction society, living the high life while "paperback originals" are left to toil in the sweat mines. It's a fascinating glimpse of a distant culture.

Harlan Ellison, O.G. hipster, waxes about the thrill of introducing someone to something great and wanes about the anti-thrill of not being able to be a snob after everyone knows about the great thing. In this film review column he kind-of-but-not-really passes the torch to Kathi Maio. By which I mean Ellison's column will still be printed whenever he sends one in, but Maio is able to review three films in six pages, where Ellison writes twelve pages in this issue and encounters only one film, The Rocketeer (he luvs it). So we're not really looking at two film review columns, we're looking at one film review column plus Harlan Ellison's blog. A wise editorial decision on the part of F&SF.

In Isaac Asimov's science column, Isaac Asimov bemoans the downsides that come along with being as smart as Isaac Asimov. Fortunately, the mighty brain of Isaac Asimov is able to cope with such petty inconveniences. I like how Asimov's column (the topic is energy) gives respect to underappreciated scientists, not just once but repeatedly.

Back to stories. Mike Resnik's "Winter Solstice" is a sad story of Merlin that really highlights how the concept of someone living backwards in time is incoherent—one of Dan Simmons's Hyperion books covered some of the same ground and I had the same problem there. Lois Tilton's "A Just and Lasting Peace" is nice and creepy alt-history that does more character development than a lot of alt-history. (With a title like that, you know it's creepy alt-history!) Marc Laidlaw's "Gasoline Lake" had too many plot twists to keep my interest but I loved the setting and the setup.

There's a cartoon of a starfield where one star says "We're the star that inspired the verse 'Twinkle Twinkle Little Star'" and another star says "Yeah? Well we're the star that inspired the song: "When You Wish Upon A Star". I may be overthinking this, but... why does each star speak of itself in the plural? Is there an unspoken SFnal twist in which stars are collective intelligences? How did the stars discover these facts? Did Jane Taylor and Leigh Harline use long-range transmission to inform the stars that inspired them? Or is this the opposite of the "lunar real estate" scam, where stars pay for certificates that lay claim to certain human songs? If you were a star, and you communicated with another star over a distance of hundreds of light-years, is this really what you would talk about? Would it be fair to say that these stars are so vain they think this song is about them?

Unaccountably, the cartoon does not answer these questions. I will say that this issue contains a "Dr. Quark, Low-Tech Physicist" cartoon that I liked.

You know, looking over this it's clear that mostly what I want to do is make note of the stories I liked and then snark on the columns, so maybe I'll rev this feature back up. Anyway, this issue was really fun. Pick up a copy 23 years ago!

[Comments] (4) The Bot of Mormon: I don't usually do in-depth analyses of my bots, especially one that's probably not gonna break ten followers, but my most recent bot is very personal to me, and the making of it turned out to be much stranger than I expected. It's The Bot of Mormon, "the most correct bot", a text-generating process with a very niche audience but the niche audience includes me, so I'm happy. A few of my recent favorites:

And again I say unto you, and more especially the elephants and cureloms and cumoms.

— The Bot Of Mormon (@TheBotOfMormon) October 16, 2014

A large and tough businessman, I pray only that I might always be found as Abraham Lincoln said: "Die when I may, by a wild olive tree."

— The Bot Of Mormon (@TheBotOfMormon) October 16, 2014

"As we read in the Book of Mormon, but I will have him come to the phone."

— The Bot Of Mormon (@TheBotOfMormon) October 14, 2014

A note: In a bid for more followers, as well as not alienating all my relatives, I designed the Bot of Mormon to be a bit of harmless humor for believing LDS folk (early versions could be pretty offensive, and I chose not to go that route). However, Saints might take offense at this blog post about how and why I made the bot. So, fair warning. Here we go.

It's not much of an exaggeration to trace my interest in generative text back to my experience growing up in Mormonism. Mark Twain famously called the Book of Mormon "chloroform in print", and I believe the reason it's so boring is that it was produced by a process similar to automatic writing. It's full of stalling and retreats to stock phrases. But what starts with the Book of Mormon sure doesn't end there. When I was a kid, church every week was a three-hour festival of stock phrases and repetition.

See, in the LDS church the task of coming up with things to say every week rotates around the general membership. Topics are assigned, and there are only about fifty topics total. Since every acceptable topic has been covered a million times before, the simplest way to make a new talk is to remember bits of old talks and mash them together.

When I was a kid I experienced this from both ends, and writing the talks was especially intense for me because despite my best efforts, I didn't actually believe. My talks were literally constructed by assembling meaningless symbols into patterns that matched what I saw other people doing. Naturally, ever since I caught the botmaking bug I've wanted to recreate this experience with a bot. I registered @TheBotOfMormon quite a while ago. But I couldn't figure out what to do until recently, when I hit upon the idea of taking as my corpus not the Book of Mormon itself, but the General Conference talks.

General Conference is a big twice-yearly event in Salt Lake where the top brass show y'all how it's done. These guys used to be lawyers and corporate executives, and their talks are all vetted by committee, so the result is... well, sometimes someone will say something offensive, but even that I wouldn't call "interesting". What is interesting is that Conference is where Mormonism meets the twenty-first century. By which I mean that's where you can see the pros use nineteenth-century language and rhetoric to talk about same-sex marriage (undesirable!) and the Internet (a mixed bag!) That's the kind of juxtaposition I thought would make a good bot. As it turns out, I was right... sort of. Eventually.

To give you a picture of what goes on in General Conference, here's a table I made of the top ten topics by decade, according to the keywords in the <meta> tags for each talk.

  1. obedience
  2. missionary work
  3. spirituality
  4. testimony
  5. Jesus Christ
  6. welfare
  7. priesthood
  8. family
  9. plan of salvation
  10. youth
  1. Jesus Christ
  2. missionary work
  3. service
  4. obedience
  5. priesthood
  6. faith
  7. love
  8. family
  9. spirituality
  10. adversity
  1. Jesus Christ
  2. faith
  3. family
  4. priesthood
  5. love
  6. service
  7. Holy Ghost
  8. obedience
  9. prayer
  10. Atonement
  1. faith
  2. Jesus Christ
  3. service
  4. testimony
  5. obedience
  6. family
  7. Holy Ghost
  8. prayer
  9. love
  10. priesthood
  1. Jesus Christ
  2. service
  3. faith
  4. priesthood
  5. obedience
  6. adversity
  7. family
  8. love
  9. Holy Ghost
  10. Atonement

You can see the shape of the fifty acceptable topics there. Anyway, I downloaded the Conference talks and set about applying my usual bag of tricks to the corpus to come up with an interesting transformation. Imagine my surprise when none of my techniques worked!

The _ebooks algorithm, up to this point an unending generator of hilarity from any corpus, failed miserably. The word-frequency filter I used to find the interesting signs for Minecraft Signs, also failed. Markov chains were useless, big surprise. I had a dim idea that the key to bot gold here was the subordinate clauses: the sentences that run on and on in a lawyerly way, embroidering themselves with their own Talmudic interpretations. I tried Queneau assembly of sentences at the clause level. This was good enough to get the bot launched, but it wasn't great. Each individual clause is very likely to be boring, its boringness has no relationship to word frequency, and combining clauses doesn't help. The corpus is fractally boring.

"Here you will find happiness, we know that the rejoicing, or anything else, they are in a state contrary to the nature of happiness."

— The Bot Of Mormon (@TheBotOfMormon) October 2, 2014

Okay, I thought, time to break out the big guns. I incorporated the Book of Mormon into my corpus, the Doctrine & Covenants; even the Pearl of Great Price, the bizarro crown jewel of the LDS canon. None of it helped. (The Pearl of Great Price helped a little—it's really weird—but it's also very short.)

Behold, and began to put heavy burdens upon their backs, and prayers of faith.

— The Bot Of Mormon (@TheBotOfMormon) October 6, 2014

But legend told of a secret weapon: the Journal of Discourses. Basically a large collection of General Conference talks from the late 19th century, during the polygamy era, containing a ton of fiery rhetoric and juicy doctrines downplayed or outright disowned by the modern church. Some might consider it dirty pool, but I was desperate to get some interesting content out of my bot. I Queneau-ified every Discourse in the Journal and added it to the corpus... to no avail. It was still dull! On the sentence fragment level, it's tough to even distinguish between the 'scandalous' stuff in the Journal and the dishwater they serve up at Conference nowadays.

And now behold, as it were, most of them in environments very different from their own.

— The Bot Of Mormon (@TheBotOfMormon) October 9, 2014

At this point I was so frustrated that I honestly started to question my unbelief. What are the odds that a corpus of text spanning hundreds of authors over nearly 200 years could be so uniformly dull? Was some divine hand at work, keeping things from getting too interesting? With shaking hands I ran my tests against a control sample: the Gutenberg text of a non-Mormon book of sermons. And it turns out nineteenth-century religious language is what's fractally boring. It's nothing to do with Mormonism in particular. The modern stuff is dull because it copies and recombines the nineteenth-century stuff.

And that, finally, was the key to what little success I've achieved with @TheBotOfMormon. When the bot is funny, the funny thing is not the rambling juxtaposition of sentence fragments per se. It's the juxtaposition of modern concepts with nineteenth-century language. To get the bot to work I would have to actually recreate that juxtaposition, not just hope for it.

Enter the Corpus of Historical American English. (Thanks, BYU! Seriously, what a great project.) This has word frequencies for every decade from the 1810s up to 2009. I picked out all the words that were 10x more common between 1930 and 1980 as they were between 1830 and 1880. I tagged all the sentence fragments that were distinctly twentieth-century. Now I can guarantee that every assemblage has an old-timey component and a more modern component, and the chances of humor go way up.

The lesson I want to take from this is that every corpus is different. I thought I could handle the LDS corpus with the same tools I use on Gutenberg, because they're both full of archaic language, but I was totally wrong. Once I engaged with the text this became obvious, but I came into this holding the text at arms' length because it held a lot of bad childhood memories.

There's no generic bot kit that will work on anything. (Well, there is, but it uses Markov chains and I don't like it.) Even my really simple bots like I Like Big Bot and Boat Names required a lot of custom behind-the-scenes work to find the most interesting subset of the data.

Perhaps this can serve as my new rule. A new bot needs to present a different way of being a bot, not just a different corpus. And adding more text to a corpus I don't know how to handle just makes the problem worse.

This document (source) is part of Crummy, the webspace of Leonard Richardson (contact information). It was last modified on Monday, September 09 2013, 18:05:52 Nowhere Standard Time and last built on Saturday, July 04 2015, 20:45:01 Nowhere Standard Time.

Crummy is © 1996-2015 Leonard Richardson. Unless otherwise noted, all text licensed under a Creative Commons License.

Document tree:

Site Search: