Conceptual Crossovers: Over my Christmas break I read Glen David Gold's Sunnyside, a great novel, a little literary for my taste, but really solid. It's one of those novels with interweaving plots, and Charlie Chaplin is the main character of one of the strands. Around the middle of the book, Chaplin is at a rally in San Francisco, giving a speech trying to sell government bonds for WWI. His rhetoric starts to falter, and we go from a transcript of the speech into a pretty believable interior monologue:

He felt abandoned. He hated the war. He hated that the country was in it, that there was no place to go but forward, that more atrocities were to come. He felt people were never intentionally beastly or malicious, but they were pompous and foolish; awful decisions were made by men divorced from their own humanity. He thought that universal peace was within reach if only people ceased to be stupid.

When he had pretended to be Trotsky, he had spoken well. But now that he was trying to be both himself and a servant of the world, he was failing. He persevered, believing that the simple act of faith, the spirit of talking with the audience, would lead to a kind of communion.

I thought this was really amazing because Gold's monologue, juxtaposed with his Chaplin character is doing at this point, explained the ending of The Great Dictator for me. Why real-life Chaplin is willing to turn the intense climax of his scathing film into a soppy train wreck: that's how he thinks he can actually make a difference. This is the only time you will listen. I still don't like the ending, but I have some sympathy in my Grinch-scale heart for the decision.

Over the break I also experienced a literature/film epiphany in the opposite direction. In my Constellation Games author commentary I say that I "reuse some of the character of Ariel from The Tempest, the guy with magic powers who gets bossed around all the time." But after rewatching The Little Mermaid I gotta admit that Ariel is also named after the girl who's so obsessed with an alien culture that she fills a cave with their incomprehensible stuff.

[Comments] (2) December Film Roundup: And in this corner... Film Roundup!

The Crummy.com Review of Things 2014: Another year, another blog post summing it up. Here's 2013. And here's 2014:


2014's big project was The Minecraft Archive project, which led into The Minecraft Geologic Survey, which led into the Reef series and two huge bots. I'm planning on doing a refresh of the data this year to get maps created in 2014--hopefully it'll be easier the second time.

I also finished Situation Normal, edited it and have now sent it out to editors and agents. I'm cautiously optimistic. I finished two short stories, "The Process Repeats" and "The Barrel of Yuks Rule", and like many of my stories they're a rewrite away from being sellable and who knows when I'll get the time.

I gave a talk on bots at Foolscap and a talk on improving Project Gutenberg metadata at Books in Browsers. That ties into my job at NYPL. I had a full-time job for most of this year, for the first time in a while, and 2015 is the year you'll get to use what I'm making.

Subcategory: Bots. You won't believe how many autonomous agents I created in 2014! I'm not even going to show you all of them, only the ones I'm really proud of. I'm going to order them by how much I like them, but I'll also include their current Twitter follower count--the only measurement that really matters in this post-apocalyptic world.

My secret goal for 2014 was to have a bot whose follower count was greater than my own. Minecraft Signs (probably my favorite bot of all time) came close but didn't quite make it.

I also created a bot that's so annoying I didn't release it. Maybe this year.


I scaled back my film watching versus 2013, but still saw about fifty features. Here's my 2014 must-watch list. As always, only films I saw for the first time are eligible for this prestigeless nonor.

  1. Pom Poko (1994)
  2. Alien (1979)
  3. My Love Has Been Burning (1949)
  4. Seven Chances (1925)
  5. A Town Called Panic (2009)
  6. Alphaville (1965)
  7. Frozen (2013)
  8. The King of Comedy (1982)
  9. Playtime (1967)
  10. The Women (1939)

These are more or less the films I would watch again (a very high bar to clear), although The King of Comedy should be watched once and only once. I'm kind of surprised that Playtime got on here since I wasn't wild about it, but I really can see how it'll be better the second time.

The runners-up: films I recommend, but will probably not see again, and if you're like "aah, it's three hours long" or "aah, David Bowie alien penis", I'll understand:

  1. Solaris (1972)
  2. The Man Who Fell to Earth (1976)
  3. Guardians of the Galaxy (2014)
  4. Queen Christina (1933)
  5. Paprika (2006)


Didn't read a lot of books this year, but I made them count. The Crummy.com Books of the Year are Dispatches, Michael Herr's Vietnam reporting memoir, and Phil Lapsley's phone-phreak history Exploding the Phone, which covers about the same time period. Both awesome.

Sumana and I selected Kim Stanley Robinson's "The Lucky Strike" for a Strange Horizons reprint. It's a great story.


Since I started commuting again it was a decent year for discovering new podcasts. Sumana and I love Just One More Thing, a deep-dive Columbo podcast. I also really like Omega Tau, a podcast that will do a two-part series on shipping container logistics, or a five-parter on the hardware and operation of the space shuttle. Honorable mention to the guilty pleasure-ish Laser Time, which is more or less random nostalgia but which brings out a lot of interesting deep cuts.


Didn't play a lot of video games because a) Minecraft Archive Project took up all the time I used to spend playing Minecraft, and b) my desktop developed a weird problem where it abruptly powers off if I stress it too much, e.g. by playing a modern computer game. I should really address this problem, but I have not, because it does prevent me from spending too much time on games.

Played a lot of board games with friends as usual. The Crummy.com Board Game of the Year is 2014's The Castles of Mad King Ludwig, a building game that captures the true thrill of interior design. Runner-up: Hanabi, the cooperative game that magically turns passive-agressiveness into an asset that benefits all. Dishonorable mention to 1989's Sniglets, a party game where having fun requires not that you disregard the scoring system (a common thing for party games) but that you deliberately play to lose.

That reminds me, I should have mentioned in 2014's review of 2013 that Encore is a party game from 1989 that's really, really good. You have to have the right group though.

That's it! How we doin' in 2015? I'm getting a lot done. In fact I just wrote this big blog post talking about the best of 2014... oh, but you're probably not interested. See ya!

More Dice Fun: A while back I wrote about a maddening but interesting book called Scarne on Dice. It's a really huge book which I intend to get rid of ASAP, but before I do there's a couple things about dice, and cheating at dice, I wanted to quote.

In perhaps the most entertaining section of the book Scarne takes on the sleaziest parties in this whole wretched business, "the crooked gambling supply houses", who sell outdated cheating devices at huge markups. According to The Big Con: The Story of the Confidence Man, another book I read recently, the mailing lists of these supply houses were coveted by con artists, because by definition, everyone on those lists "liked the best of it." One catalog's advice to buyers, according to HoyleScarne:

When telegraphing use the following code: PAINT for cards and CUBE for dice.


This head-slapping entry from Scarne's inventory of trick dice needs to be quoted in full:


These are a very brazen brand of mis-spotted dice that show 7 or 11 every roll. Since the catalog lists them, there apparently are buyers, but they are strictly for use on very soft marks and then only on dark nights. One die bears only the numbers 6 and 2; the other nothing but 5's! Since anyone but a blind man would tag these cubes as mis-spots, the moment they rolled out, they are of no use except for night play under an overhead light when the chumps can't see anything but the top surfaces of the dice. Strictly for use by cheats who don't know what a real set of Tops is.

There's a a couple entertaining but long stories of specific cheats which I won't transcribe. The best is the story of "the mouth switch". Seems there was a craps hustler in the 30s who kept a trick die in his mouth and introduced into the game it by cupping the dice in his hands and "blowing" on them. They called him "Mononucleosis Joe". Actually they called him "The Spitter," but they only started calling him that after he tried this trick while drunk and ended up rolling all three dice onto the craps table.

Finally, a tale of collegiality which I feel gets really boring if you explain what the numbers mean:

Several years ago the Harvard Computation Laboratory put a battery of calculating machines to work and came up with a whole book full of answers. Since the binomial formula is used in many problems and so often requires staggering amounts of arithmetic, they constructed a set of Cumulative Binomial Probability Distribution Tables which give provability fractions for a wide range of values of n, r, and P. And because Dr. Frederick Mosteller, Chairman of the Department of Statistics, had seen a copy of Scarne on Dice and was aware of the 26 game problem, he saw to it that the calculating machienes were asked to provide figures for the terms n = 130 and P = 1/6.

It's easy to read this book and feel superior to the people who get fooled by seemingly rudimentary tricks (David Maurer, author of The Big Con, specifically points this out in his book), but I'm sure someone who knew their stuff could take my entire roll in a crooked dice game. Why am I so sure? Because you could take my entire roll in a completely fair dice game.

January Film Roundup: January started with three highly anticipated films that all turned out to be duds! What to do for the rest of the month, but stack the deck?

The Ghost of Ghostbusters Past: Just a quick semi-technical post on how I made @WeBustedGhosts, my new bot that casts movies from an alternate history where "ghostbusters" is a stock comedy genre, sort of a twentieth-century commedia dell'arte. In particular, I did a lot of work with IMDB data that I want to record for your benefit (and by you, I mean future me).

The bot was inspired by two things: first, this video by Ivan Guerrero which "premakes" Ghostbusters as a 1954 comedy starring Bob Hope, Fred MacMurray, and Martin/Lewis. Second, the reaction of fools to the fact that women comedians will bust ghosts in the upcoming Ghostbusters remake. More specifically, Kris's endless mockery of the idea that "ghostbuster" is a job with a legitimate gender qualification.

These things got me thinking about the minimal set of things you need to make Ghostbusters. You need the idea of combining a horror movie with a comedy about starting a business. Someone could have come up with that idea in the silent film era. You need a director and four actors who can do comedy. And all those people need to be alive and working at the same time, because ghosts aren't real... OR ARE THEY? Either way, you can describe a point in Ghostbusters space with six pieces of information: four actors, a director, and a year. That's small enough to fit into a tweet, so I made a Twitter bot.

Our journey to botdom starts, as you might expect, with an IMDB data dump. I've dealt with IMDB data before and this time I was excited to learn about IMDbPY, which promised to get a handle on the ancient and not-terribly-consistent flat-file IMDB data format. Unfortunately IMDbPY is designed for looking up facts about specific movies, not for reasoning over the set of all movies. However, it does have a great script called imdbpy2sql.py, which will take the flat-file format and turn it into a SQL database.

There will be SQL in this discussion (because I want to show you/future me how to do semi-complex stuff with the database created by IMDbPY), but unless you're future me, you can skip it. Basically, for each actor in IMDB, I need to calculate that actor's tendency to get high billing in popular comedies for a given year. They don't have to be good comedies, or Ghostbusters-like comedies, they just have to have a lot of IMDB ratings.

I also want to figure out each actor's effective comedy lifespan. If an actor stops doing popular comedy or dies or retires, they should stop showing up in the dataset. If a dramatic actor branches out into comedy they should show up in the dataset as of their first comedic performance. Basically, if you learned that this actor starred in a comedy that came out in a certain year, it shouldn't be a big surprise.

Orson Wells would be great in a Ghostbusters movie, but he never did comedy, so he's not in the dataset. How about... Cameron Diaz? She rarely gets top billing, but she has second or third billing in a lot of very popular comedies. For a year like 1997 she tops the list of potential women Ghostbusters.

How about... Peter Falk? His first comedy role was in 1961's Pocket Full of Miracles, his last in 2005's Checking Out. His acting career stretches from 1957 to 2009, but he's only a potential Ghostbuster between 1961 and 2005. He won't get chosen very often, because he's not primarily known for comedy (i.e. his comedies aren't as popular as other peoples'), but it will happen occasionally.

That's the data I extracted. Not "how famous is this actor" but "how much would you expect this actor to be in a comedy in a given year".

The IMDbPY database is more complicated than I like to deal with, so my strategy was to use SQL get a big table of roles and then process it with Python. Here's SQL to get every major role in a comedy that has more than 1000 votes on IMDB:

select title.title, title.production_year, movie_info_idx.info, name.name, name.gender, cast_info.nr_order, kind_id from title join cast_info on title.id=cast_info.movie_id join name on cast_info.person_id=name.id join movie_info_idx on movie_info_idx.movie_id=title.id join movie_info on movie_info.movie_id=title.id where cast_info.role_id in (1,2) and kind_id in (1,3,4) and movie_info.info_type_id=3 and movie_info.info='Comedy' and cast(movie_info_idx.info as integer) > 1000 and movie_info_idx.info_type_id=100 and cast_info.nr_order <= 7;

Some explanation of numbers and IDs:

I run this on a SQLite database and the output looks like:

#1 Cheerleader Camp|2010|2297|Cassell, Seth|m|2|4

So the title of the movie is "#1 Cheerleader Camp", it came out in 2010, it has 2297 votes, and Seth Cassell (a man) was an actor in that movie and got fourth billing.

Why didn't I include television in this query? Because television on IMDB is really complicated. See, actors aren't credited to television shows; they're credited to individual episodes. But nobody rates individual episodes; they rate the show as a whole. So I had to do a separate query to determine who the top actors were on each comedy television show, and then divide up that show's votes between the four top actors. Otherwise actors whose primary comedy career is in television won't get their due.

Here's SQL to get all the roles in TV episodes:

select tv_show.title, episode.title, episode.production_year, votes.info, name.name, name.gender, cast_info.nr_order from title as tv_show join title as episode on tv_show.id=episode.episode_of_id join cast_info on episode.id=cast_info.movie_id join name on cast_info.person_id=name.id join movie_info_idx as votes on votes.movie_id=tv_show.id join movie_info on movie_info.movie_id=tv_show.id where cast_info.role_id in (1,2) and tv_show.kind_id in (2,5) and episode.kind_id=7 and movie_info.info_type_id=3 and movie_info.info='Comedy' and cast(votes.info as integer) > 10000 and votes.info_type_id=100 and cast_info.nr_order < 5;

This is pretty similar to the last query but some of the IDs are different.

I run this and the output looks like:

'Allo 'Allo!|A Bun in the Oven|1991|14022|Kaye, Gorden|m|1

This means there's an 'Allo 'Allo! episode called "A Bun in the Oven", the episode came out in 1991, 'Allo 'Allo (NOT this specific episode) has 14,022 votes, and Gorden Kaye got top billing for this episode.

I got this data out of a database as quickly as possible and bashed at it to make a TV show look like a movie with four actors--the four actors who appeared in the most episodes of the TV show.

Directors were pretty similar to film actors. for each director who's ever worked in comedy, I measured their tendency towards putting out a popular comedy in any given year. There's a very strong power law here, with a few modern directors overshadowing their contemporaries, and Charlie Chaplin completely obliterating all his contemporaries.

Here's SQL to get all comedies with their directors:

select title.title, title.production_year, movie_info_idx.info, name.name, name.gender from title join cast_info on title.id=cast_info.movie_id join name on cast_info.person_id=name.id join movie_info_idx on movie_info_idx.movie_id=title.id join movie_info on movie_info.movie_id=title.id where cast_info.role_id in (8) and kind_id in (1,3,4) and movie_info.info_type_id=3 and movie_info.info='Comedy' and cast(movie_info_idx.info as integer) > 5000 and movie_info_idx.info_type_id=100;

The only new number here is cast_info.role_id in (8), which means I'm now picking up directors instead of actors.

At this point I was done with the SQL database. I wrote the "Ghostbusters casting office". It chooses a year, picks a cast and a director for that year, and then (15% of the time) it picks a custom title. My stupidly hilarious technique for custom titles is to choose the name of an actual comedy from the given year and replace one of the nouns with "Ghost" or "Ghostbuster". So far this has led to films like "Don't Drink the Ghost" and (I swear this happened during testing) "Ghostbuster Dad".

Here's how I pick a cast for a given year: I line up all the actors for that year by my calculated variable "tendency towards being a Ghostbuster", and then I use random.expovariate to choose from different places near the front of the list (to bias the output towards actors you won't have to look up). This is the same trick I use for Serial Entrepreneur to choose common (but not too common) adjectives and nouns for its inventions. My means are 0.85, 0.8, 0.75, and 0.7, which will, on average, give me someone who's at the 85th percentile, someone at the 80th percentile, 75th percentile and 70th percentile.

This is the best I could do to recreate the dynamic of 1984 Ghostbusters where Bill Murray and Dan Aykroyd were very well-known actors even before Ghostbusters, where Ernie Hudson and Harold Ramis were not. At this point you might object that Ernie Hudson and Harold Ramis weren't even 75th or 70th percentile. Ghostbusters was Ramis's second movie ever as an actor; I think there was an oral history that said he gave himself the part of Egon Spengler because no one else was a big enough dork. So for pure accuracy I should be doing, like, 0.90/0.85/0.35/0.30. But that gives you way too many obscure actors and the output isn't as fun. It also doesn't feel accurate, because 1984 Ghostbusters was a real movie, and all by itself it made Hudson and Ramis pretty famous actors. So now we expect "Ghostbuster" to be sort of a prestige comedy role.

A more valid point is that 0.8/0.8/0.75/0.7 also doesn't really capture the dynamic of the 2016 Ghostbusters, where all four actors are well-known but Kristen Wiig has twice the credits of the other three. So I also created an 0.85/0.8/0.8/0.75 mode, which will tend to give you more big-name ensembles.

As always, there's a lot of behind-the-scenes data munging. Going from a bunch of "xth billing in movie with y votes" entries to a single "tendency towards being a Ghostbuster" number required a lot of semi-arbitrary decisions, and I think my algorithm still undercounts television actors. Whenever there was a power law, I smoothed it out a little to increase the variety of the output. I smoothed out the overrepresentation of post-IMDB comedies compared to pre-IMDB comedies; of superstar directors like Chaplin who overshadow everyone else in their time; and of men directors vastly outnumbering women.

Representation of women comedic actors vs. men was not an issue because I followed the lead of the Ghostbusters remake. 45% of the ghostbusting teams are all women, and 45% are all men. (10% of teamups are coed, just to add variety.) There's no code that makes sure all the actors speak the same language or anything like that—I could extract that data from IMDB but it would be a lot of work to make the output of the bot less interesting.

And there you go. It's not source code, but you should be able to see more or less how I took this bot from concept to execution, and how I negotiated the tricky space between "this is an accurate representation of what would happen in an alternate universe where the primary cinematic comedy genre is films about busting ghosts" and "this is a fun output for this bot to have."

Poems of SCIENCE! I Mean, Science: I picked up a cheap old poetry anthology called Poems of Science, figuring there'd be some good stuff. And... there was, but I had wait for the modern conception of "science" to come about, and then spot poetry about a hundred years to come to grips with it, and decide that science is interesting and not going to go away. By that time I was more than halfway through the anthology. But around the late nineteenth century some excellent poetry starts happening, and I thought I'd share a couple links.

Miroslav Holub's Zito the Magician and Robert Browning's much longer An Epistle Containing the Strange Medical Experience of Karshish, the Arab Physician are really great and work as spec-fic stories. Swinburne's Hertha is this weird humanist we-are-made-of-star-stuff mythology that's what you'd expect from Swinburne. And then there's "Cosmic Gall", a goofy poem by John Updike which I'm gonna quote in full because it's the only thing of John Updike's I've read and liked.

Cosmic Gall
John Updike

Neutrinos, they are very small.
They have no charge and have no mass
And do not interact at all.
The earth is just a silly ball
To them, through which they simply pass,
Like dustmaids down a drafty hall
Or photons through a sheet of glass.
They snub the most exquisite gas,
Ignore the most substantial wall,
Cold shoulder steel and sounding brass,
Insult the stallion in his stall,
And, scorning barriers of class,
Infiltrate you and me. Like tall
And painless guillotines they fall
Down through our heads into the grass.
At night, they enter at Nepal
And pierce the lover and his lass
From underneath the bed—you call
It wonderful; I call it crass.

Minecraft Archive Project: 201502 Capture: I've done a new capture of data for the Minecraft Archive Project, my big 2014 project to archive the early history of Minecraft before it disappeared. My goal for the refresh was to capture what has happened in the past year while doing as little work as possible, and I met my goal. The whole thing took about two weeks, and most of that was a matter of letting things run overnight. Most of the actual work was refactoring the code I wrote the first time to make future captures even easier.

Top-line numbers: I've archived another 150 gigabytes of good stuff, including 18k maps and schematics, 1k mods, 11k skins, 7k texture packs (resource packs now, I guess), and 100k screenshots. I was able to archive about 73% of the maps. Four percent of them maps were just gone, and 23% I didn't know how to download.

The 201404 Minecraft Archive Project capture contains data from four sites. The new 201502 capture is limited to two sites: the official Minecraft forum and the huge Planet Minecraft site. I started archiving maps, mods, and textures for Minecraft Pocket Edition, and was able to pick up about 5500 MCPE maps.

Now that I've done this twice without getting into trouble, I'll give a little more detail about the process. I've got scripts that download the archives of the Minecraft forum and Planet Minecraft. I find all the threads/projects modified since the last capture, download the corresponding detail pages (e.g. the first page of a forum thread--I'm only after the original post), and extract all the links.

Then it's a matter of archiving as many of those links as possible. I've written recipes for archiving images and downloads. These six recipes take care of the vast majority of items:

There's also a general catch-all for people who host things on normal home pages, as Tim Berners-Lee intended. If your URL looks like the URL to an image or a binary archive, I will ask for that URL. If you serve me the image or the binary instead of an HTML file telling me to click on something, then I'll archive the file.

I decode most link shorteners except for the ones that make you click through ads, mainly adfoc.us and adf.ly. The 2014 archive had about 18,000 maps behind adf.ly links, and I spent a lot of time running Selenium clients clicking through the ads to discover the Mediafire links. I think that took a month. This time there were about 3000 new maps behind adf.ly links and I just didn't bother.

There are two big blind spots in my dataset, and they're the same as last time. One is mods. A lot of mods are hosted on Github and CurseForge, two big sites I didn't write recipes for. There's also the issue of mod packs, which have been steadily growing in popularity and complexity as development on core Minecraft winds down. Thanks to things like the Hardcore Questing Mod, modpacks are entering the "custom challenge" territory previously occupied solely by world archives.

There are sites that list mod packs (1 2) but I don't want to spend the time figuring out how to archive all the mod packs. There's also the problem that mod packs are huge.

The second blind spot is servers. It's theoretically possible to join a public Minecraft server with a modded client and automatically archive the map, but realistically it ain't gonna happen. I complained about this last time, but now I've done an assessment of what's being lost.

Planet Minecraft has a big server list that mentions the last time it was able to ping any particular server. There doesn't seem to be any purging of dead servers, so I'm able to get good measurements of the typical lifecycle.

Of the 136k servers in the list, 12k are "online" (The most recent Planet Minecraft ping was successful). 51k are "offline" (Most recent Planet Minecraft ping failed, but there was a successful ping less than two weeks ago) and 73k I declare "dead" (last successful ping was more than two weeks ago). It seems really weird that of the nearly half of the 'offline' servers went offline in the past two weeks, so something's going on there; maybe Planet Minecraft's ping process is unreliable, or it just takes a long time to check every server, or servers go up and down all the time.

Anyway, the median lifetime for a public Minecraft server is 434 days, a little over a year. These things go online, people do a bunch of work on them, and then they disappear. I've kind of gotten to 'acceptance' on this, but it's still obnoxious.

One final thing: I thought I'd check if I could see the result of Mojang's June announcement of rules for how you can make money by hosting servers (and, more importantly, how you can't). I wanted to see if these rules had a chilling effect on the formation of new servers or caused a lot of old servers to shut down.

And... no, not really. Here's a chart showing two sixty-day periods around June 12, the date of the Mojang blog post. For each day I show 'births' (the number of servers first seen on that day) and 'deaths' (the number of servers last seen on that day). There's a drop-off in new servers around the end of July, but then it picks up again stronger than before. I don't have an explanation for it but I don't think there's anything in here you can pin on a blog post. The Mojang rules were probably intended to go after a small number of large obnoxious servers, and everyone else either doesn't care or flies under the radar.

(Screenshot is from World #57 by Art_Fox. I didn't archive the map because it's behind an adf.ly link, but I got the screenshot.)

PS: Congratulations to Anticraft, the oldest public Minecraft server I could find that's still online, added to Planet Minecraft on February 28, 2011.

Update: I fixed up the adf.ly code and let it run for another two weeks (!), saving another 2000 Minecraft maps and 700 MCPE maps. I probably won't do this again because it's a huge pain, but I said that this time and ended up doing it out of some sense of obligation to the future, so maybe obligation will strike again, who knows.

Reviews of Old Science Fiction Magazines: F&SF October 1985: The first story in this magazine is James Tiptree's "The Only Neat Thing to Do", and the introductory copy introduces the main character as "a green-eyed young woman who happens to be one of the most appealing characters you are likely to encounter in these or any other pages," and my attitude was "Pffft, green eyes, sure, we'll see about that... DAMMIT." This story's so good. It starts out with this perfect wish-fulfillment space adventure but look at the title, folks, it's not gonna end well. Argh, so good.

Harlan Ellison still hates Gremlins, in fact he says he's been getting letters from people who scoffed at his Gremlins hate but now they've seen the movie they're swallowing their pride and sending him "toe-scuffling, red-faced, abnegating appeals for absolution." I'm harboring a doubt or two here, because he's also saying other people who took his advice (and presumably didn't see the movie) are thanking him. Given that Gremlins has consistently been a well-regarded film since its release, why would someone say "Thanks for warning me off the movie I haven't seen that people still seem to like."?

But all that's in the past. In this issue Ellison doubles down, telling people not to see The Goonies due to "utter emptyheadedness", which, okay, at least it's a critique and not 'the lurkers support me in email.' Also on Ellison's shit list for this month: Rambo: First Blood Part II, A View to a Kill, and The Black Cauldron. He loves Cocoon, Ladyhawke, and Return to Oz, and who's to say he's wrong? Not me, 'cause I haven't seen any of those movies.

There's some really corny back-cover copy in one of the ads for books, but I know from experience that writing back-cover copy is the worst, so as a professional courtesy I'm not going to make fun of it. Kind of weird that most of the stories in this issue are SF or horror, but all the ads are for fantasy books.

Halley's Comet fever strikes the classifieds! There's an ad for Halley's Comet, 1910: Fire in the Sky, sort of a historical recreation by Jerred Metz. Also a "HALLEY'S COMET. TIE TAC or Stick Pin. Four color enamel and beautiful." I'm hyping up the Halley's Comet thing because I happen to own a mint in-box Halley's Comet Hot Wheels car the likes of which are currently going on eBay for a measly $5.32 used including shipping. C'mon! This is my nest egg here! I demand... demand!

January Film roundup:

March Film Roundup: We saw lots of stuff this month but not a lot of feature films. The upside is that a lot of what I did see is online for free.

April Film Roundup: Sumana spent a lot of time out of town this month, so I took the opportunity to clear out a bunch of items on my "movies I want to see but Sumana doesn't" list. But there's also plenty of movies we saw together. How can you tell the difference?... I think you'll be able to tell.

[Comments] (3) The Future Is Prologue: I'm experimenting with writing a prologue for Situation Normal, to reduce the thrown-into-the-deep-end feeling typical of my fiction. I say 'experimenting with' rather than 'just doing it' because I wrote something and it wasn't a prologue. I'd just turned back the clock to before the book started and written a regular scene.

I don't like prologues for the very reason I'm trying to write one: they're introductory infodumps. I usually skim them, unless they look like the Law and Order style prologues where the POV character dies at the end of the scene. But this book has so many POV characters already, I don't think I should go that route.

I talked it over with Sumana and she gave me the idea of pacing the prologue as though it were the first scene of a short story. That's something I've done before, so I know I can do it again, and it doesn't mean big infodumps, just more internal monologue.

I'd like your suggestions of genre fiction books with effective prologues. Prologues that made you say "yes, I want to read a whole book about this stuff." I can't think of many examples but I admit I'm blinded by prejudice.

May Film Roundup: This month features some interesting foreign films, an old-favorite blockbuster, and an awesome new blockbuster with a surprising connection to one of my all-time favorite films. What are these nuggets of cinema gold? I don't know, I'm just the intro paragraph, you'll have to ask the bulleted list:

[Comments] (1) Reviews of Old Science Fiction Magazines: Analog 1985/07: Here it is, the final entry in this series, started seven years ago when I picked up a bunch of old SF magazines at a swap-fest. I've acquired a lot of magazines since then, and those are getting 'old', so it could continue, but this is the last of the original set. And good riddance, because this magazine smells like laundry detergent for some reason.

So what do we got? The cover story (one assumes) is the first part of Timothy Zahn's "Spinneret", which would later be published as a novel. It was good but I kinda see where it's going and don't feel a strong need to read the novel.

Eric G. Iverson's "Noninterference" is a pleasant story whose sole purpose is to dis the Prime Directive. The accompanying artwork seems more appropriate to a story about the mixing of the ultimate prog-rock album.

Charles L. Harness's "George Washington Slept Here" is the cream of this issue: a creative, funny and entertaining story that combines several Analog favorites (aliens, historical figures, and fussy middle-aged hobbies) that you rarely see together. Bonus: no time travel or major alt-history, just a character with a really long lifespan. I really liked the concept of the main character, a lawyer who loses every case he takes, but in a way that's more beneficial to his client than if he'd won. That concept's strong enough to support a series, but it looks like this is the only one.

This month's vague story blurbs:

Now to nonfiction. David Brin's essay "Just How Dangerous Is The Galaxy?" classifies every known potential solution to the Fermi Paradox and puts them in a big table by which term of the Drake Equation they affect. He also introduces his own "Water World" solution, which he deigns to classify in a separate section called "Optimism". This solution posits that "Earth is unusually dry for a water world," and that intelligent life evolves all the time, and thrives for long periods, but very rarely builds spaceships. I'm just riffing on the idea here, and I don't buy the idea that "hands and fire" are prerequisites to advanced technology, but you could imagine a dolphin-type civilization treating a planet's surface and atmosphere the way we treat low-earth orbit.

Tom Easton's book review column includes a review of Ender's Game, which wanders into a long philosophical discussion that I won't reproduce here because it's pretty similar to stuff you can find on the Internet. I was disappointed to read that "Russel M. Griffin's The Timeservers is a pale incarnation of the diplomatic satire that made Laumer's Retief so popular." It was a Phillip K. Dick Award finalist, though, so maybe it's just on a different wavelength from Laumer.

In letters, paleontologist Jack Cohen returns fire at Tom Easton, who in an earlier book review column disputed the evolutionary biology in Harry Harrison's Cohen-collaboration West of Eden. And reader Michael Owens has it out with Ben Bova about the latter's support of the Star Wars program. Summary of Owens: "far from leading to a defense-oriented world, Star Wars leads to another offense-oriented arms race." Bova responds that he wrote a book (Assured Survival) that deals with all this stuff, and then mentions this comforting tidbit:

[T]he new defensive technologies do not apply only to satellites and ballistic missiles. They are already being developed into "smart weapons" that will make the tanks, artillery, planes, and ships of conventional land and sea warfare little more than expensive and very vulnerable targets. "Star Wars" technologies (plural!) can make all forms of aggressive warfare so difficult that an era of worldwide peace is in view—if the nations of the world want peace.

Which leads nicely into the thing I've saved for last because I've got a lot to say about it, in direct violation of my usual "if you can't say anything nice" rule. Previously on Analog, columnist G. Harry Stine asked readers to send in their answers to the following question, which I will quote in full:

What, in your opinion, is the most important problem that technologists should tackle in the next twenty years, and why do you believe this?

In this issue Stine reports the results, and I was looking forward to doing a kind of The Future: A Retrospective thing on them.

The first thing Stine does is disqualify 120 of the 127 replies he got. That may seem extreme, but that's approximately what I'd do if I was running a magazine and accepting fiction submissions. I was kind of laughing along as he disqualified entries for exceeding the word limit or otherwise ignoring the rules, but then I got to this:

49.61% of the replies [63 of 127]... discussed problems that were either (a) not technological problems, but social and political instead; (b) already solved or well along the road to solution; (c) trivial and parochial in their scope; (d) based on incorrect, incomplete, or outmoded data; and/or (e) the result of someone else's telling the respondent that the problem was a problem because the expert said so, whereupon the respondent stated it on faith without checking.

And at this point I gotta call bullshit. You didn't say "most important technological problem", you said "most important problem technologists should tackle." Social and political problems have technical aspects, and vice versa. The impact of a technological development is judged by its effect on society. This is the basis of the science fiction genre! You could replace every vague Analog story blurb with "Social and political problems tend to have technical aspects, and vice versa...", and it would always fit the story!

Half of Analog's readership can follow directions but their opinions are wrong. Let's take a look at the top five disqualified "problems" (all direct quotes, scare quotes in original):

  1. Control of nuclear weapons
  2. the "population explosion"
  3. the "energy shortage"
  4. the "raw materials shortage"
  5. "pollution" in various and sundry forms

I sure am glad technologists didn't waste any more time on these non-problems after 1985! According to Stine, America's ballistic missile defense system is well on its way to solving #1 (if the nations of the world want peace, of course). #2 isn't a problem anymore because the rate of population growth has slowed. #3 and #4 were never real problems. ("The only reason we had an 'energy shortage' was to provide an excuse for politicians and bureaucrats to gain control of natural resources, and thereby gain control over people.") As for #5, who's to say what counts as "pollution"? Like most words, it's a "semantically-loaded term". "Pollution in its many forms may be a localized problem in some areas, but it is not a worldwide problem."

So what are the seven entries that made the cut? I'm glad you asked, previous sentence:

  1. "Making products maintenance-free, i.e. designed for a 100-year life with a 0.0001 probability of maintenance." DISQUALIFIED. Maybe the move from 75 years to 100 would be a technical improvement, but the problem as it exists today is a problem with the way products are sold, and technical improvements won't change that.
  2. "[C]ontrol of the weather" to boost crop yields and prevent famine. SEMI-DISQUALIFIED. Modern famines are political problems, not technical problems. Control of the weather would indeed be great, not for this reason, but because it would let us mitigate the damage caused by our worldwide pollution problem.
  3. "The construction and maintenance of closed ecological systems". Sure, OK.
  4. Here's the shortest quote I could get that explains this one:
    Education depends on communication. John points out that communication involves moving information from place to place... which really isn't much of a problem, but... managing the information is. It's possible to download lots of information into a student's mind. But if the student doesn't know how to determine what information is meaningful and relevant... everything stored in the student's memory is useless.

    Now that's more like it! Not only is this a real problem, it's one that we made significant progress on between 1985 and 2005!

  5. "The development of the direct link between the human mind and the computer to produce a true intelligence amplifier." Another good one. We got both parts of this (mind-computer link and intelligence amplifier), but in practice they don't have anything to do with each other.
  6. "[T]he construction by machines of very small machines." This also happened but proved not to be a huge deal, and even Stine is kinda skeptical ("he doesn't specify exactly what technological problems can be solved by developing sub-microscopic technology"). I'm gonna go out on a limb and say the real problem is the reader doesn't specify exactly what social or political problems can be solved with this technology.
  7. And finally,
    Del Cain of Augusta, ME presented a technological problem that is as much philosophical as technological... He wants technologists to develop structures and artifacts that tend to support healthy behavior in human beings—i.e. to help people live and rear children so they can develop to their full potential without trauma but not without struggle, difficulty, or drama. To do this, he believes that we should solve the technological problem of determining what are the optimum sizes and structures of healthy communities. In short, he feels that the big problem is developing technology with a life-affirming philosophy behind it.

    I don't understand how Del Cain managed to smuggle the concept of Scandanavian social democracy past G. Harry Stine, but good job. No, wait, I figured it out: I'm projecting, and so was he.

Well, there we go, that's our look at old SF magazines of the 80s. To commemorate the end of the series, I've scanned all the old ads in this magazine, not just the ones I thought were interesting or funny. But here are the ones I thought were interesting or funny:

I'll leave you with this question: what, in your opinion, is the most important problem that technologists should have tackled from 1985 to 2005, and why do you believe this?

[Comments] (1) Beautiful Soup 4.4.0 beta: I've found an agent for Situation Normal and the book is out to publishers and I don't have to think about it for a while. As seems to be my tradition after finishing a big project, I went through the accumulated Beautiful Soup backlog and closed it out. I've put out a beta release which I'd like you to try out and report any problems.

I've fixed 17 bugs, added some minor new features, and changed the implementations of __copy__ and __repr__ to work more like you'd expect from Python objects. But in my mind the major new change is this: I've added a warning that displays when you create a BeautifulSoup object without explicitly specifying a parser:

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. To get rid of this warning, change this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "lxml")

It's a little annoying to get this message, but it's also annoying to have your code silently behave differently because you copied it to a machine that didn't have lxml installed, and it's also annoying when I have to check pretty much every reported bug to see whether this is the problem. Whenever I think I can eliminate a class of support question with a warning, I put in the warning. It saves everybody time.

The other possibility: now that Python's built-in HTMLParser is decent, I could make it so that it's always the default unless you specify another parser. This would cause a big one-time wrench, as even machines which have lxml installed would start using HTMLParser, but once it shook out the problem would be solved. I might still do that, but I think I'll give everyone about a year to get rid of this annoying warning.

Anyway, try out the beta. Unless there's a big problem I'll be releasing 4.4.0 on Friday.

[Comments] (1) June Film Roundup:

Tragically, this marks the end of Film Roundup, as the resolution I foolishly made late in the month means that the only movies I can see from this point on are the likes of Hocus Pocus (1993), Heaven's Prisoners (1996), Hurt Penguins (1992), and the Tagalog comedy classic Haba-baba-doo! Puti-puti-poo! (1997). We'll miss the magic, the mystery, but most of all... the movies.

Wait, I can just disregard resolutions? They're not legally binding? Amazing! See you next month! I gotta go cancel my Columbia Record Club membership.

[Comments] (2) July Film Roundup: Sumana was gone for most of the month, and I discovered how easy it is to get to Film Forum from the library to see a movie after work. And when Sumana was around we saw a bunch of movies together, and the upshot is that I've now seen every movie ever made and there are no more movies. Here's just a sampling of the films I saw in July.

In honor of seeing The Third Man and The Fifth Element in the same month I'd like to announce the Criterion Collection Film Festival. I call it that because I've collected movies that meet a certain criterion. I don't anticipate any trouble. Anyway, here's the lineup!

  1. The First Time (2012)
  2. The Second Face (1950)
  3. The Third Man (1949)
  4. The Forth Kind (2009)
  5. The Fifth Element (1997)
  6. The Sixth Sense (1999) <- Bruce Willis double feature!
  7. The Seventh Seal (1957)
  8. The Eighth Day (1996)
  9. The Ninth Configuration (1980)

Hope to see you there!

August Film Roundup: I think this month is about as close as Film Roundup has gotten to a random sample of movies. The museum did a series based on the 70mm film format, so we got three movies that have nothing in common except a decision to put really big film in the camera. Overall pretty happy with this month's crop though.

[Comments] (4) Top 100 Films From Women Directors: Sumana is tired of dude movies, so I went through this list of 100 great movies by female directors and noted the ones that a) I think Sumana would like (no Pet Sematary) and b) I am willing to watch (no Jeanne Dielman, 23 Quai du Commerce, 1080 Bruxelles, a film Sumana really likes but just thinking about it makes me fall asleep. I'm asleep right now!) There were about twenty-five such movies.

The above-linked list is very quirky, and although the idiosyncracies generally work in the reader's favor (gotta figure out a way to see Jodie Mack's Dusty Stacks of Mom (2013)), it left rhetorical space for men to come into the comments section and say HOW could you OVERLOOK this GROUNDBREAKING film, [potentially useful recommendation], for you see, I know a LOT about FILM. Which I must admit would have happened anyway.

I don't know a lot about film, but I do know how to run SQL queries against IMDB data, so I thought I would make an intersubjective list of the top 100 films directed by women, judged by their IMDB ratings. In general I copied the implicit rules of the hand-picked list. Only feature-length films are here. No documentaries, no concert footage. (There is one comedy special in here, but whatever.)

As usual, films with fewer than 150 votes on IMDB were not considered. Also as usual, there are no links because the IMDB dataset is far too ancient for such things. I did some spot checks and kicked a couple movies off the list for obvious astroturfing. I don't believe one of the movies on this list is real, but I left it on the list because it's so weird.

Here's the list:

1. The Matrix (1999)Wachowski, Lana8.7Action, Sci-Fi
2. Cidade de Deus (2002)Lund, Kátia8.7Drama, Crime
3. Voskhozhdenie (1977)Shepitko, Larisa8.3Drama, War
4. Drushyam (2014)Sripriya8.3Drama, Thriller, Family
5. Moe no suzaku (1997)Kawase, Naomi8.2Drama
6. Zindagi Na Milegi Dobara (2011)Akhtar, Zoya8.1Drama, Romance, Comedy, Adventure, Family
7. Salaam Bombay! (1988)Nair, Mira8.1Drama, Crime
8. Mr. and Mrs. Iyer (2002)Sen, Aparna8.0Drama
9. Le roman de Renard (1930)Starewicz, Irene8.0Comedy, Fantasy, Animation, Family
10. Slumdog Millionaire (2008)Tandan, Loveleen8.0Drama, Romance
11. Persepolis (2007)Satrapi, Marjane8.0Drama, Animation, War, Biography
12. Chelovek s bulvara Kaputsinov (1987)Surikova, Alla8.0Romance, Comedy, Musical, Western
13. Zero Motivation (2014)Lavie, Talya7.9Drama, Comedy
14. Chou tin dik tong wah (1987)Cheung, Mabel7.9Drama, Romance
15. Out 1, noli me tangere (1971)Schiffman, Suzanne7.9Drama
16. Tau ban no hoi (1982)Hui, Ann7.9Drama
17. Gett (2014)Elkabetz, Ronit7.9Drama
18. Sharasôju (2003)Kawase, Naomi7.9Drama
19. Gangoobai (2013)Krishnaswamy, Priya7.9Drama, Family
20. Patrice O'Neal: Elephant in the Room (2011)McCarthy-Miller, Beth7.9Comedy
21. Jeanne Dielman, 23 Quai du Commerce, 1080 Bruxelles (1975)Akerman, Chantal7.9Drama
22. Little Miss Sunshine (2006)Faris, Valerie7.9Drama, Comedy, Adventure
23. English Vinglish (2012)Shinde, Gauri7.9Drama, Comedy, Family
24. Shrek (2001)Jenson, Vicky7.9Comedy, Fantasy, Animation, Adventure, Family
25. La distancia más larga (2013)Pinto, Claudia7.9Drama
26. Pasqualino Settebellezze (1975)Wertmüller, Lina7.9Drama, Comedy, War
27. Dönüs (1972)Soray, Türkan7.8Drama, Romance
28. Strangers in Good Company (1990)Scott, Cynthia7.8Drama
29. Awakenings (1990)Marshall, Penny7.8Drama, Biography
30. Dolgie provody (1971)Muratova, Kira7.8Drama
31. Ne dao Bog veceg zla (2002)Tribuson, Snjezana7.8Romance
32. Tong nien wang shi (1985)Yang, Li-Yin7.8Drama, Biography
33. Dedictví aneb Kurvahosigutntag (1993)Chytilová, Vera7.8Comedy
34. Cheshmane John Malkovich 1: Viggo Mortensen (2004)Solati, Sara7.8Drama, Fantasy, Horror, Mystery
35. Earth (1998)Mehta, Deepa7.8Drama, Romance, War
36. Nu ren si shi (1995)Hui, Ann7.8Drama, Comedy
37. Lost in Translation (2003)Coppola, Sofia7.8Drama
38. Efter brylluppet (2006)Bier, Susanne7.8Drama
39. Water (2005)Mehta, Deepa7.8Drama, Romance
40. Die Abenteuer des Prinzen Achmed (1926)Reiniger, Lotte7.8Romance, Fantasy, Animation, Adventure
41. Rocks in My Pockets (2014)Baumane, Signe7.7Comedy, Drama, Animation
42. Kirschblüten - Hanami (2008)Dörrie, Doris7.7Drama, Romance
43. Selma (2014)DuVernay, Ava7.7Drama, Biography, History
44. Nirgendwo in Afrika (2001)Link, Caroline7.7Drama, Biography
45. Hævnen (2010)Bier, Susanne7.7Drama
46. S tebou me baví svet (1983)Polednáková, Marie7.7Comedy, Family
47. Nastroyshchik (2004)Muratova, Kira7.7Drama, Comedy, Crime
48. Die Höhle des gelben Hundes (2005)Davaa, Byambasuren7.7Drama
49. Sita Sings the Blues (2008)Paley, Nina7.7Comedy, Fantasy, Romance, Animation, Musical
50. Sans toit ni loi (1985)Varda, Agnès7.7Drama
51. Olivier, Olivier (1992)Holland, Agnieszka7.7Drama
52. Little Fugitive (1953)Orkin, Ruth7.7Drama, Family
53. Film d'amore e d'anarchia, ovvero 'stamattina alle 10 in via dei Fiori nella nota casa di tolleranza...' (1973)Wertmüller, Lina7.7Drama, Romance, Comedy
54. Le bonheur (1965)Varda, Agnès7.7Drama
55. Krylya (1966)Shepitko, Larisa7.7Drama
56. Jibeuro Ganeun Gil (2013)Pang, Eun-jin7.7Drama
57. Whale Rider (2002)Caro, Niki7.7Drama, Family
58. Frozen (2013)Lee, Jennifer7.7Family, Fantasy, Animation, Adventure, Comedy, Musical
59. Europa Europa (1990)Holland, Agnieszka7.7Drama, War, History
60. Elsker dig for evigt (2002)Bier, Susanne7.7Drama, Romance
61. Die Fremde (2010)Aladag, Feo7.6Drama
62. Away from Her (2006)Polley, Sarah7.6Drama
63. Saving Face (2004)Wu, Alice7.6Drama, Romance, Comedy
64. Tou ze (2011)Hui, Ann7.6Drama
65. En chance til (2014)Bier, Susanne7.6Drama, Thriller
66. Wadjda (2012)Al-Mansour, Haifaa7.6Drama, Comedy
67. My Life Without Me (2003)Coixet, Isabel7.6Drama, Romance
68. Neposlusni (2014)Djukic, Mina7.6Drama
69. 36 Chowringhee Lane (1981)Sen, Aparna7.6Drama, Romance
70. Depuis qu'Otar est parti... (2003)Bertuccelli, Julie7.6Drama
71. The Hurt Locker (2008)Bigelow, Kathryn7.6Drama, War, Thriller
72. American Psycho (2000)Harron, Mary7.6Drama, Crime
73. The Secret Life of Words (2005)Coixet, Isabel7.6Drama, Romance
74. Brødre (2004)Bier, Susanne7.6Drama, War
75. Yeo-haeng-ja (2009)Lecomte, Ounie7.6Drama
76. Ting shuo (2009)Cheng, Fen-fen7.6Drama, Romance
77. I Am Sam (2001)Nelson, Jessie7.6Drama
78. The Namesake (2006)Nair, Mira7.6Drama
79. Boys Don't Cry (1999)Peirce, Kimberly7.6Drama, Biography
80. Büyük adam küçük ask (2001)Ipekçi, Handan7.6Drama
81. Hanezu no tsuki (2011)Kawase, Naomi7.6Drama
82. Pora umierac (2007)Kedzierzawska, Dorota7.6Drama
83. La faute à Fidel! (2006)Gavras, Julie7.6Drama, History
84. Kazoku no kuni (2012)Yang, Yong-hi7.5Drama
85. Zir-e poost-e shahr (2001)Bani-Etemad, Rakhshan7.5Drama
86. Proof (1991)Moorhouse, Jocelyn7.5Drama
87. Ramchand Pakistani (2008)Jabbar, Mehreen7.5Drama
88. Te doy mis ojos (2003)Bollaín, Icíar7.5Drama, Romance
89. Nanayomachi (2008)Kawase, Naomi7.5Drama
90. La misma luna (2007)Riggen, Patricia7.5Drama
91. Travolti da un insolito destino nell'azzurro mare d'agosto (1974)Wertmüller, Lina7.5Drama, Comedy, Adventure
92. Samt el qusur (1994)Tlatli, Moufida7.5Drama
93. Et maintenant on va où? (2011)Labaki, Nadine7.5Drama, Comedy
94. The Japanese Wife (2010)Sen, Aparna7.5Drama, Romance
95. An Angel at My Table (1990)Campion, Jane7.5Drama, Biography
96. Antonia (1995)Gorris, Marleen7.5Drama, Comedy
97. Hooligans (2005)Alexander, Lexi7.5Drama, Sport, Crime
98. Trolösa (2000)Ullmann, Liv7.5Drama, Romance
99. A New Leaf (1971)May, Elaine7.5Romance, Comedy
100. We Need to Talk About Kevin (2011)Ramsay, Lynne7.5Drama, Thriller
101. Ke tu qiu hen (1990)Hui, Ann7.5Drama
102. Mita Tova (2014)Granit, Tal7.5Drama
103. Ratcatcher (1999)Ramsay, Lynne7.5Drama
104. ...ing (2003)Lee, Eon-hie7.5Romance
105. Tin shui wai dik yat yu ye (2008)Hui, Ann7.5Drama
106. American Splendor (2003)Berman, Shari Springer7.5Drama, Comedy, Biography
107. Tian yu (1998)Chen, Joan7.5Drama
108. Cloud Atlas (2012)Wachowski, Lana7.5Drama, Sci-Fi
109. Jestem (2005)Kedzierzawska, Dorota7.5Drama
110. Korotkie vstrechi (1968)Muratova, Kira7.5Drama, Romance
111. Dogfight (1991)Savoca, Nancy7.5Drama, Romance, War
112. Across the Universe (2007)Taymor, Julie7.5Drama, Fantasy, Romance, Musical
113. Sedmikrásky (1966)Chytilová, Vera7.5Drama, Comedy

There are 113 movies in this list because IMDB ratings only have 0.1 star precision. If you're a woman and you direct a movie that gets a 7.5, congrats, you're tied for 84th place.

Susanne Bier and Ann Hui each have five films on the list. Naomi Kawase has four. Some of the directors share the credit with a man, notably Lana Wachowski and Suzanne Schiffman. Barring any titles I don't recognize because they're not in English, the only films on this list I've seen are Sita Sings the Blues, Whale Rider, Frozen and A New Leaf. My personal favorites, among movies I know were directed by women, are A New Leaf and Wayne's World.

Finally, here's the base query I used to get the info I needed out of the database. I used the same database I built for Ghostbusters Past.

select distinct(title.id), title.title, title.production_year, rating.info, votes.info, movie_info.info, kind_id, name.name, name.gender from title join cast_info on title.id=cast_info.movie_id join name on cast_info.person_id=name.id join movie_info_idx as rating on rating.movie_id=title.id join movie_info_idx as votes on votes.movie_id=title.id join movie_info on movie_info.movie_id=title.id where cast_info.role_id=8 and kind_id=1 and movie_info.info_type_id=3 and rating.info_type_id=101 and votes.info_type_id=100 and name.gender='f';

Update: The pedantry continues with Darius Kazemi telling me that Loveleen Tandan was the casting director on Slumdog Millionare, not the director who yelled "cut!" and "action!" and "it's a wrap!". If IMDB says role_id=8, that's good enough for me, but YMMV.

Update #2: danima asked about English-language films. I don't think IMDB tracks the primary language of a film, just whether a language is used in the film. So I can filter on "English", but I'll still pick up films that are primarily in French or Hindi, so long as there is some English dialogue. Our story begins right after Across the Universe, where the previous list leaves off. Basically if your film is in English you only need to get a 7.4 or 7.3 (still several standard deviations above the median) to get in the top 100. I have not vetted this list for astroturf:

57. Pismo do Amerika (2001)Triffonova, Iglika7.4Drama
58. Bastard Out of Carolina (1996)Huston, Anjelica7.4Drama
59. Frida (2002)Taymor, Julie7.4Drama, Romance, Biography
60. Chance (2002)Benson, Amber7.4Drama, Comedy
61. Kaméleon (2008)Goda, Krisztina7.4Drama, Comedy, Thriller
62. Paris, je t'aime (2006)Chadha, Gurinder7.4Drama, Romance, Comedy
63. Le fils de l'autre (2012)Lévy, Lorraine7.4Drama
64. Lifted (2010)Alexander, Lexi7.4Drama
65. Belle (2013)Asante, Amma7.4Drama
66. Desert Flower (2009)Hormann, Sherry7.4Drama, Biography
67. Me and You and Everyone We Know (2005)July, Miranda7.4Drama, Comedy
68. On Dangerous Ground (1951)Lupino, Ida7.4Drama, Romance, Thriller, Film-Noir, Crime
69. Paris, je t'aime (2006)Coixet, Isabel7.4Drama, Romance, Comedy
70. Bound (1996)Wachowski, Lana7.4Drama, Thriller, Crime
71. Zero Dark Thirty (2012)Bigelow, Kathryn7.4Drama, Thriller, History
72. También la lluvia (2010)Bollaín, Icíar7.4Drama, History
73. Monsoon Wedding (2001)Nair, Mira7.4Drama, Romance, Comedy
74. Mimì metallurgico ferito nell'onore (1972)Wertmüller, Lina7.4Comedy
75. Hollow Reed (1996)Pope, Angela7.4Drama
76. The Trouble with Angels (1966)Lupino, Ida7.4Comedy
77. The Selfish Giant (2013)Barnard, Clio7.4Drama
78. Mikey and Nicky (1976)May, Elaine7.4Drama
79. José Rizal (1998)Diaz-Abaya, Marilou7.3Drama, War, Biography, History
80. Titus (1999)Taymor, Julie7.3Drama, Thriller, History
81. Sepet (2004)Ahmad, Yasmin7.3Drama, Romance, Comedy
82. Kung Fu Panda 2 (2011)Yuh, Jennifer7.3Family, Drama, Animation, Adventure, Action, Comedy
83. Put oko sveta (1964)Jovanovic, Soja7.3Comedy, Adventure, Western
84. Fish Tank (2009)Arnold, Andrea7.3Drama
85. Infinitely Polar Bear (2014)Forbes, Maya7.3Drama, Comedy
86. An Education (2009)Scherfig, Lone7.3Drama
87. The Black Balloon (2008)Down, Elissa7.3Drama, Romance
88. North Country (2005)Caro, Niki7.3Drama
89. Thousand Pieces of Gold (1991)Kelly, Nancy7.3Romance, Western
90. Funny Valentines (1999)Dash, Julie7.3Drama
91. The Secret Life of Bees (2008)Prince-Bythewood, Gina7.3Drama
92. Stander (2003)Hughes, Bronwen7.3Action, Drama, Biography, Crime
93. Shao nu xiao yu (1995)Chang, Sylvia7.3Drama
94. The Prize Winner of Defiance, Ohio (2005)Anderson, Jane7.3Drama, Biography
95. Craig's Wife (1936)Arzner, Dorothy7.3Drama
96. Firaaq (2008)Das, Nandita7.3Drama, History
97. Blood and Sand (1922)Arzner, Dorothy7.3Drama, Romance, Sport
98. My Brilliant Career (1979)Armstrong, Gillian7.3Drama, Romance, Biography
99. Eve's Bayou (1997)Lemmons, Kasi7.3Drama
100. The Name Is Rogells (Rugg-ells) (2011)Warner, Rachel7.3Romance, Adventure
101. The Voices (2014)Satrapi, Marjane7.3Comedy, Thriller, Crime
102. The Woodsman (2004)Kassell, Nicole7.3Drama
103. Talaash (2012)Kagti, Reema7.3Drama, Mystery, Thriller, Crime
104. My First Mister (2001)Lahti, Christine7.3Drama, Romance, Comedy
105. Big (1988)Marshall, Penny7.3Drama, Fantasy, Romance, Comedy
106. Monster (2003)Jenkins, Patty7.3Drama, Biography, Crime
107. The Secret Garden (1993)Holland, Agnieszka7.3Drama, Fantasy, Family
108. Little Women (1994)Armstrong, Gillian7.3Drama, Romance
109. Fire (1996)Mehta, Deepa7.3Drama, Romance
110. The Connection (1962)Clarke, Shirley7.3Drama

September Film Roundup: Didn't see a lot of movies this month, so I'm going to add a new mini-feature that will run for the next few months. I'll be briefly reviewing some TV shows that, although I haven't seen (and may never see) absolutely every episode, I feel like I can evaluate the show as a whole. But first, our feature presentations:

And now the TV section. Obviously my technique of waiting until I can evaluate the show as a whole, creates a selection bias towards good television shows. I'll sit through a bad movie and then pan it in Film Roundup, but a bad TV show is outa here, especially since I watch movies on my own but I only watch TV with Sumana. But what's the problem with talking about good TV? Try this on for size:

(Before you ask, Religious Huckster Trick #1 is "God told me to tell you to give me money.")

[Comments] (1) To Stop Disturbance: I was reading to Sumana the most interesting bits from Washington Goes To War, a book by David Brinkley about the changes to Washington D.C. over the course of World War II. It's full of interesting historical tidbits, including:

But the thing Sumana wanted me to record verbatim was the policy that Washington D.C.'s Casino Royal put into place for dealing with the inevitable fistfights between soldiers and sailors. "Night after night," these inter-service resentments boiled over, and so the Casino Royal wrote down these rules and posted them "on a wall backstage under the heading TO STOP DISTURBANCE."

  1. Lower the house lights
  2. Turn the spotlight on a large American flag hanging from the ceiling
  3. Start up an electric fan aimed at the flag, causing it to flutter
  4. Have the band instantly stop playing dance music and strike up "The Star-Spangled Banner".
  5. Call in the military police and the navy's shore patrol
It always worked. The soldiers and sailors stopped swinging at each other, faced the flag and stood at attention while the band played. There was no way a uniformed military man in wartime could refuse to do this, however angry he was. Before the anthem was finished, the military police and the shore patrol were walking up the steps from Fourteenth Street.

The one that really gets me is #3. I can see how this behavior would be drilled into you as a reflex action, but #3 makes it feel like they're trying to inspire you, remind you what you're fightin' for. And then the MPs show up.

[Comments] (4) : Recently I gave a talk called "The Enterprise Media Distribution Platform At The End Of This Book". It summarizes my first eighteen months on the Library Simplified project at NYPL Labs. The goal of Library Simplified is to make it as easy to check ebooks out from a public library as it is to buy them from Amazon.

We've just secured a multi-year grant to expand the project, and we are hiring up from two developers to eight. We are quadrupling the size of our development team.

This is a really satisfying job for me because I'm making life substantially better for people who aren't already well off. If you like that prospect, if you like what I say in the "Enterprise Media Distribution" talk, and you want to work on this project, you should apply for one of these position by sending your resume to info@librarysimplified.com.

I'm going to link to the job listings in a minute, but first I want to make it real clear that we put up these listings largely to have entry points into the HR system. As the team lead I'm not concerned with counting how many terms on your resume match terms used in the job listing. We need two Android developers and four people to write server-side code and HTML and Javascript. I don't think we need a team made up entirely of Senior Developers. Other skills might be more important.

For instance, we need someone with devops experience. We'll be dealing with e-commerce, cryptography, and machine learning—all things I know little about. We don't care if you have a CS degree, but if you have a Library Science degree or have worked in the publishing industry, that would be useful. We have big collections in Spanish, Chinese and Russian, but nobody on our team reads those languages. Stuff like that.

With that in mind, here are the job listings:

As you can see if you click around, getting into the HR system to formally "apply" for these jobs requires filling out a really long form. (Update: and now these links don't even work anymore because the jobs got shifted around.) Instead of doing that, send your resume to info@librarysimplified.com and we'll only ask you to fill out the form if we want to bring you in for an interview.

All these positions are in New York City, in the big building on 42nd Street with the lions. This is a project funded by grants, and the salaries we offer are not competitive with Facebook or Goldman Sachs, but they are competitive with other nonprofits. The benefits are good. This is not a job that ruins your life. It's 35 hours a week and you get four weeks of vacation per year. I work from home about one day a week. Send me email or leave a comment if you have any questions about benefits.

Auditioning: Sampling a Dataset to Maximize Diversity: My latest bot is Roller Derby Names, which takes its data from a list of about 40,000 distinct names chosen by roller derby participants. 40,000 is a lot of names, and although a randomly selected name is likely to be hilarious, if you look at a bunch of them they can get kind of repetitive. My challenge was to cut it down to a maximally distinctive subset of names. I used a simple technique I call 'auditioning' (couldn't find a preexisting name for it) which I first used with Minecraft Signs:

  1. Shuffle the list.
  2. Create a counter of words seen
  3. For each string in the list:
    1. Split the string into words.
    2. Assume the string is not distinctive.
    3. For each word in the string:
      1. If this word has been seen fewer than n times, the string is distinctive.
      2. Increment the counter for this word.
    4. If the string is distinctive, output it.

My mental idea of this process is that each string is auditioning before the talent agent from the classic Chuck Jones cartoon One Froggy Evening. One word at a time, the string tries to impress the talent agent, but the agent has seen it all before. In fact, the agent has seen it all n times before! But then comes that magical word that the agent has seen only n-1 times. Huzzah! The string passes its audition. But the next string is going to have a tougher time, because with each successful audition the agent becomes more jaded.

You don't have to worry about stopwords because the string only needs one rare word to pass its audition. By varying n you can get a smaller or larger output set. For Minecraft Signs I set n=5, which gave a wide variety of signs while eliminating the ones that say "White Wool". For Roller Derby Names I decided on n=1.

Here's the size of the Roller Derby Names dataset, n-auditioned for varying values of n:
nDataset size
∞ (original data)40198

Auditioning the Roller Derby Names with n=50 excludes only the most generic sounding names: "Crash Baby", "Bad Lady", "Queen Bitch", etc. Setting n=1 restricts the dataset to the most distinctive names, like "Battlestar Kick Asstica" and "Collideascope". But it still includes over half the dataset. There's not really a lot of difference between n=10 and n=4, it's just, how many names do you want in the corpus.

I want to note that this is this is not a technique for picking out the 'good' items. It's a technique for maximizing diversity or distinctiveness. You can say that a name excluded by a lower value of n is more distinctive, but for a given value of n it can be totally random whether or not a name makes the cut. "Angry Beaver" made it into the final corpus and "Captain Beaver" didn't. As "beaver" jokes go, I'd say they're about the same quality. When the algorithm encountered "Captain Beaver", it had already seen "captain" and "beaver". If the list had been shuffled differently, the string "Captain Beaver" would have nailed its audition and "Angry Beaver" would be a has-been. That's show biz. This technique also magnifies the frequency of misspellings, as anyone who follows Minecraft Signs knows.

Also note that "Dirty Mary" is excluded by n=50. It's not the greatest name but it is a legitimate pun, so in terms of quality it should have made the corpus, but "Dirty" and "Mary" are both very common name components, so it didn't pass.

PS: Boat Name Bot (Roller Derby Names's sister bot) does not use this technique. There's no requirement that a boat name be unique, and TBH most boat-namers aren't terribly creative. Picking boat names that have only been used once (and are not names for human beings) cuts the dataset down plenty.

Bot Techniques: The Wandering Monster Table: In preparation for the talk I'm giving Friday at Allison's unofficial Bot Summit, I'm writing little essays explaining some of the techniques I've used in bots. Today: the Wandering Monster Table!

In D&D, the Wandering Monster Table is a big situation-specific table that makes it possible for you, the Dungeon Master, to derail your carefully planned campaign on a random mishap. You roll the dice and a monster just kind of shows up and has to be dealt with. There are different tables for different scenarios and different biomes, but they're generally based on this probability distribution (from AD&D 1st Edition):

This doesn't mean you're going to run into Ygorl (Lord of Entropy) once every twenty-five adventures. There are a ton of Very Rare monsters, and Ygorl is just one chaos lord. He can't be everywhere. What this means is that most of the time the PCs are going to experience normal, boring wandering monsters. Die rolls form a normal distribution, and 68% (~65%) of die rolls will fall within one standard deviation of the mean. Those are your common monsters.

Go out two standard deviations (95%, ~65%+20%+11%) and things might get a little hairy for the PCs. Go out three standard deviations (99.7%, ~65%+20%+11%+4%) and you're looking at something really weird that even the Dungeon Master didn't really plan for. But what, exactly? That depends on the situation, and it may require another dice roll.

The WMT is a really good abstraction for creating variety. I use it in my bots all the time. Here's a sample of the WMT for Serial Entrepreneur:

common = ["%(product)s", "%(product)s!", "%(product)s...\n%(variant)s...", "%(product)s? %(variant)s?", ... ] uncommon = [ "%(product)s... %(variant)s...? Just throwing some ideas around.", "%(product)s... or maybe %(variant)s...", "%(product)s or %(variant)s?", "Eureka! %(product)s!", ... ] rare = [ "I don't think I'll ever be happy with my %(product)s...", "Got a meeting with some VCs to pitch my %(product)s!", "I'm afraid that my new %(product)s is cannibalizing sales of my %(variant)s.", "The %(product)s flopped in my %(state)s test market... back to the draw ing board.", ... ] very_rare = [ "Am I to be remembered as the inventor of the %(product)s?", "Sometimes I think about Edison's famous %(product)s and I wonder... can my %(product2)s compare?", "I haven't sold a single %(product)s...", "I hear %(billionaire)s is working on %(a_product)s...", ... ]

This creates a personality that most of the time just mutters project ideas to itself, but sometimes (uncommonly) gets a little more verbose, or (rarely) talks about where it is in the product development process, or (very rarely) compares itself to other inventors. The 'common' bucket contains nine entries which are slight variants; the 'rare' bucket contains 32 entries which are worded very differently.

The WMT works the same way in Smooth Unicode and Euphemism Bot. All these bots have their standbys: common constructs they return to over and over. Then they have three more tiers of constructs where the result is aesthetically riskier, or the joke is less likely to land, or a little of that construct goes a long way.

I also use the WMT in A Dull Bot to a more subtle purpose. Each tweet contains a random number of typos, and each typo is chosen from a WMT. One of the common typos is to transpose two letters. A very rare typo is to uppercase one word while leaving the rest of the sentence alone.

The WMT fixes one of the common aesthetic problems with bots, where every output is randomly generated but it gets dull quickly because the presentation is always the same. Since you can always dump more stuff into a WMT, it's an easy way to keep your bot's output fresh. In particular, whenever I get an idea like emoji mosaics, I can add it to Smooth Unicode's WMT instead of creating a whole new bot.

There's a Python implementation of a Wandering Monster Table in olipy.

October Film Roundup: This month starts very mainstream, with lots of gunplay and explosions, but—plot twist!—takes a right turn into the avant-garde. And then ends with some random stuff. Just the way I, and, hopefully, you, like it.

And now, the continuation of Television Roundup. We actually finished a show this month!

Roy's Postcards Return[s]!: Back in 2009 I started a project to transcribe and put online over 1000 postcards my dad bought in the 1980s. The toolchain that took things from postcards to web pages was always kind of rickety, and the project petered out altogether when my sisters sent me about 500 more postcards that Dad sent them. I decided I wouldn't start it up again until I'd transcribed all 1500 postcards and could put everything up at once.

Now it's done! The best way to experience it is through the daily @RoyPostcards bot. This is a labor of love for me, so I'm not as concerned that people follow along, but I tried to add interesting commentary whenever I could, and it's an interesting glimpse into everyday life in the 80s.

November Film Roundup: I remember this month's movies being meh-ful, but when I went back to the list there were three really good movies, and I'd just allowed my memories to be overwhelmed by the underwhelming movies, because I saw the three really good movies all in a row. No more! Let joy be unconfined!

[Comments] (1) December Film Roundup: The final Film Roundup of the year! Step onto the red carpet, and... no, wipe your feet first! Geez.

And now, the Television Spotlight focuses on a show that we watched in its entirety in December:



Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.