News You Can Bruise for 2015 October

Thu Oct 01 2015 09:05 September Film Roundup: Didn't see a lot of movies this month, so I'm going to add a new mini-feature that will run for the next few months. I'll be briefly reviewing some TV shows that, although I haven't seen (and may never see) absolutely every episode, I feel like I can evaluate the show as a whole. But first, our feature presentations:

Rififi (1955) a.k.a. "Du rififi chez les hommes", a.k.a. "Rough Stuff" (my translation). Where's the dividing line between French New Wave films inspired by American noir, and just plain French noir? I don't know. This is definitely on the 'just plain French noir' side, but everything in this movie—the misogyny, the stylishness, the despair—is just a more extreme version of what you get from Truffaut. Not recommended overall but the half-hour silent heist scene is black-and-white gold, everything that was promised.
Fantastic Mr. Fox (2009): The first Wes Anderson film I've enjoyed rather than admired. Everything is so cute and twee but with a little edge, so the style is a perfect match. If I've read the book it was in grade school, so I don't know who gets the credit for this idea, but I love how, despite being totally anthropomorphized, the stop-motion animals are animals. Really cool seeing Mr. Fox get into a hissing match with his lawyer, casually killing chickens, etc. Also love how 'cuss' is used as an all-purpose swear word.
Kumaré (2001): Saw with Sumana and mother-in-law. Not really happy with the way this ended. It's common for the creator of a documentary about pulling a con job to start to feel remorse for their marks about halfway through the documentary. And this does happen to Vikram Gandhi in Kumaré, but when it's time to come clean he doesn't show the remorse. He retcons his con job as "Yes, I misled you and lied to you, but it was all in the service of a larger spiritual goal!"
Gandhi has a degree in religious studies so he should know this is Religious Huckster Trick #2. And of course it works. He pulls it off! But he's still operating the con.
Desperately Seeking Susan (1985): This was on the list of great films by female directors (see previous post), and it was showing at the museum, so we caught the next train posthaste! (Actually we walked.) It's a fun movie that's very much a time capsule, not just because of the New York and the fashions and the yuppie coffee tables and the Madonna but because not one single element of this plot can coexist with cell phones or the Internet.
Well, one element can—hyperspecific amnesia caused by otherwise harmless head trauma—but that's just ridiculous, so I'm not counting it. No, you know what, even that can't survive cell phones. "I forgot who I am... good thing I'm still logged into Facebook."
You just know I'll like this movie because there's a very strong Celine and Julie go Boating vibe, not just in the magic show but in the way Roberta just picks up Susan's identity and tries it on for a while. Really fun.

And now the TV section. Obviously my technique of waiting until I can evaluate the show as a whole, creates a selection bias towards good television shows. I'll sit through a bad movie and then pan it in Film Roundup, but a bad TV show is outa here, especially since I watch movies on my own but I only watch TV with Sumana. But what's the problem with talking about good TV? Try this on for size:

The Dick Van Dyke Show (1951-1966) - Or as I just typed into IMDB, "The Dick Van." I remember reading an essay that explained that Leave it to Beaver was a groundbreaking show because it showed post-WWII parents trying to figure out how to raise children without the corporal-punishment-centric style their parents used on them. But phooey on that, because Leave it to Beaver is not funny. The Dick Van Dyke Show shows a postwar couple trying to figure out how to be good parents and partners, and it's really really funny. It's got workplace comedy, metahumor, tastefully wacky neighbors, everything good you'd want from a sitcom. Rob and Laura will have a disagreement that turns into an argument and then a reconciliation, it will be realistic and funny, and they'll shoot it all in one long take. It's so good. Sometimes they tire of the normal fare and do a sitcom version of The Twilight Zone instead.
Best moment: Buddy, one of Rob's co-workers, is always making these awful jokes about his shrewish wife Pickles. And then in one episode all the co-workers have a night out on the town. Buddy brings Pickles along, and she's great! She's a Broadway chorus girl, she's the life of the party, she and Buddy are perfect for each other, totally in love, and you realize, of course! Why would Buddy marry someone who'd make him miserable? He's just an asshole who adopts this Borscht Belt persona at work. The show doesn't go out of its way to point out any of this; it just quietly develops the characters in ways that reward paying attention.

(Before you ask, Religious Huckster Trick #1 is "God told me to tell you to give me money.")

(1) Sun Oct 04 2015 11:05 To Stop Disturbance: I was reading to Sumana the most interesting bits from Washington Goes To War, a book by David Brinkley about the changes to Washington D.C. over the course of World War II. It's full of interesting historical tidbits, including:

The attempt to notify essential personnel of the attack on Pearl Harbor, without notifying the other 27,000 people in the same football stadium watching the Washington Redskins game.
An entirely legal scheme by which a Washington columnist and the Spanish ambassador arranged payoffs in exchange for "the columnist [writing] about previously unknown virtues he saw in Francisco Franco."
The controversial origins of having taxes automatically deducted from your paycheck.

But the thing Sumana wanted me to record verbatim was the policy that Washington D.C.'s Casino Royal put into place for dealing with the inevitable fistfights between soldiers and sailors. "Night after night," these inter-service resentments boiled over, and so the Casino Royal wrote down these rules and posted them "on a wall backstage under the heading TO STOP DISTURBANCE."

Lower the house lights
Turn the spotlight on a large American flag hanging from the ceiling
Start up an electric fan aimed at the flag, causing it to flutter
Have the band instantly stop playing dance music and strike up "The Star-Spangled Banner".
Call in the military police and the navy's shore patrol

It always worked. The soldiers and sailors stopped swinging at each other, faced the flag and stood at attention while the band played. There was no way a uniformed military man in wartime could refuse to do this, however angry he was. Before the anthem was finished, the military police and the shore patrol were walking up the steps from Fourteenth Street.

The one that really gets me is #3. I can see how this behavior would be drilled into you as a reflex action, but #3 makes it feel like they're trying to inspire you, remind you what you're fightin' for. And then the MPs show up.

(4) Tue Oct 13 2015 09:42: Recently I gave a talk called "The Enterprise Media Distribution Platform At The End Of This Book". It summarizes my first eighteen months on the Library Simplified project at NYPL Labs. The goal of Library Simplified is to make it as easy to check ebooks out from a public library as it is to buy them from Amazon.

We've just secured a multi-year grant to expand the project, and we are hiring up from two developers to eight. We are quadrupling the size of our development team.

This is a really satisfying job for me because I'm making life substantially better for people who aren't already well off. If you like that prospect, if you like what I say in the "Enterprise Media Distribution" talk, and you want to work on this project, you should apply for one of these position by sending your resume to info@librarysimplified.com.

I'm going to link to the job listings in a minute, but first I want to make it real clear that we put up these listings largely to have entry points into the HR system. As the team lead I'm not concerned with counting how many terms on your resume match terms used in the job listing. We need two Android developers and four people to write server-side code and HTML and Javascript. I don't think we need a team made up entirely of Senior Developers. Other skills might be more important.

For instance, we need someone with devops experience. We'll be dealing with e-commerce, cryptography, and machine learning—all things I know little about. We don't care if you have a CS degree, but if you have a Library Science degree or have worked in the publishing industry, that would be useful. We have big collections in Spanish, Chinese and Russian, but nobody on our team reads those languages. Stuff like that.

With that in mind, here are the job listings:

As you can see if you click around, getting into the HR system to formally "apply" for these jobs requires filling out a really long form. (Update: and now these links don't even work anymore because the jobs got shifted around.) Instead of doing that, send your resume to info@librarysimplified.com and we'll only ask you to fill out the form if we want to bring you in for an interview.

All these positions are in New York City, in the big building on 42nd Street with the lions. This is a project funded by grants, and the salaries we offer are not competitive with Facebook or Goldman Sachs, but they are competitive with other nonprofits. The benefits are good. This is not a job that ruins your life. It's 35 hours a week and you get four weeks of vacation per year. I work from home about one day a week. Send me email or leave a comment if you have any questions about benefits.

Fri Oct 16 2015 20:47 Auditioning: Sampling a Dataset to Maximize Diversity: My latest bot is Roller Derby Names, which takes its data from a list of about 40,000 distinct names chosen by roller derby participants. 40,000 is a lot of names, and although a randomly selected name is likely to be hilarious, if you look at a bunch of them they can get kind of repetitive. My challenge was to cut it down to a maximally distinctive subset of names. I used a simple technique I call 'auditioning' (couldn't find a preexisting name for it) which I first used with Minecraft Signs:

Shuffle the list.
Create a counter of words seen
For each string in the list:
1. Split the string into words.
2. Assume the string is not distinctive.
3. For each word in the string:
  1. If this word has been seen fewer than n times, the string is distinctive.
  2. Increment the counter for this word.
4. If the string is distinctive, output it.

My mental idea of this process is that each string is auditioning before the talent agent from the classic Chuck Jones cartoon One Froggy Evening. One word at a time, the string tries to impress the talent agent, but the agent has seen it all before. In fact, the agent has seen it all n times before! But then comes that magical word that the agent has seen only n-1 times. Huzzah! The string passes its audition. But the next string is going to have a tougher time, because with each successful audition the agent becomes more jaded.

You don't have to worry about stopwords because the string only needs one rare word to pass its audition. By varying n you can get a smaller or larger output set. For Minecraft Signs I set n=5, which gave a wide variety of signs while eliminating the ones that say "White Wool". For Roller Derby Names I decided on n=1.

Here's the size of the Roller Derby Names dataset, n-auditioned for varying values of n:

n Dataset size

∞ (original data) 40198

100 40191

50 40089

10 37860

6 36104

5 35307

4 34203

3 32751

2 30387

1 25710

n	Dataset size
∞ (original data)	40198
100	40191
50	40089
10	37860
6	36104
5	35307
4	34203
3	32751
2	30387
1	25710

Auditioning the Roller Derby Names with n=50 excludes only the most generic sounding names: "Crash Baby", "Bad Lady", "Queen Bitch", etc. Setting n=1 restricts the dataset to the most distinctive names, like "Battlestar Kick Asstica" and "Collideascope". But it still includes over half the dataset. There's not really a lot of difference between n=10 and n=4, it's just, how many names do you want in the corpus.

I want to note that this is this is not a technique for picking out the 'good' items. It's a technique for maximizing diversity or distinctiveness. You can say that a name excluded by a lower value of n is more distinctive, but for a given value of n it can be totally random whether or not a name makes the cut. "Angry Beaver" made it into the final corpus and "Captain Beaver" didn't. As "beaver" jokes go, I'd say they're about the same quality. When the algorithm encountered "Captain Beaver", it had already seen "captain" and "beaver". If the list had been shuffled differently, the string "Captain Beaver" would have nailed its audition and "Angry Beaver" would be a has-been. That's show biz. This technique also magnifies the frequency of misspellings, as anyone who follows Minecraft Signs knows.

Also note that "Dirty Mary" is excluded by n=50. It's not the greatest name but it is a legitimate pun, so in terms of quality it should have made the corpus, but "Dirty" and "Mary" are both very common name components, so it didn't pass.

PS: Boat Name Bot (Roller Derby Names's sister bot) does not use this technique. There's no requirement that a boat name be unique, and TBH most boat-namers aren't terribly creative. Picking boat names that have only been used once (and are not names for human beings) cuts the dataset down plenty.

Tue Oct 20 2015 09:03 Bot Techniques: The Wandering Monster Table: In preparation for the talk I'm giving Friday at Allison's unofficial Bot Summit, I'm writing little essays explaining some of the techniques I've used in bots. Today: the Wandering Monster Table!

In D&D, the Wandering Monster Table is a big situation-specific table that makes it possible for you, the Dungeon Master, to derail your carefully planned campaign on a random mishap. You roll the dice and a monster just kind of shows up and has to be dealt with. There are different tables for different scenarios and different biomes, but they're generally based on this probability distribution (from AD&D 1st Edition):

65% of the time you will get a Common monster, like a really big rat.
20% of the time you will get an Uncommon monster, like a hobgoblin.
11% of the time you will get a Rare monster, like a neo-otyugh.
4% of the time you will get a Very Rare monster, like Ygorl, Lord of Entropy.

This doesn't mean you're going to run into Ygorl (Lord of Entropy) once every twenty-five adventures. There are a ton of Very Rare monsters, and Ygorl is just one chaos lord. He can't be everywhere. What this means is that most of the time the PCs are going to experience normal, boring wandering monsters. Die rolls form a normal distribution, and 68% (~65%) of die rolls will fall within one standard deviation of the mean. Those are your common monsters.

Go out two standard deviations (95%, ~65%+20%+11%) and things might get a little hairy for the PCs. Go out three standard deviations (99.7%, ~65%+20%+11%+4%) and you're looking at something really weird that even the Dungeon Master didn't really plan for. But what, exactly? That depends on the situation, and it may require another dice roll.

The WMT is a really good abstraction for creating variety. I use it in my bots all the time. Here's a sample of the WMT for Serial Entrepreneur:

common = ["%(product)s", "%(product)s!", "%(product)s...\n%(variant)s...", "%(product)s? %(variant)s?", ... ] uncommon = [ "%(product)s... %(variant)s...? Just throwing some ideas around.", "%(product)s... or maybe %(variant)s...", "%(product)s or %(variant)s?", "Eureka! %(product)s!", ... ] rare = [ "I don't think I'll ever be happy with my %(product)s...", "Got a meeting with some VCs to pitch my %(product)s!", "I'm afraid that my new %(product)s is cannibalizing sales of my %(variant)s.", "The %(product)s flopped in my %(state)s test market... back to the draw ing board.", ... ] very_rare = [ "Am I to be remembered as the inventor of the %(product)s?", "Sometimes I think about Edison's famous %(product)s and I wonder... can my %(product2)s compare?", "I haven't sold a single %(product)s...", "I hear %(billionaire)s is working on %(a_product)s...", ... ]

This creates a personality that most of the time just mutters project ideas to itself, but sometimes (uncommonly) gets a little more verbose, or (rarely) talks about where it is in the product development process, or (very rarely) compares itself to other inventors. The 'common' bucket contains nine entries which are slight variants; the 'rare' bucket contains 32 entries which are worded very differently.

The WMT works the same way in Smooth Unicode and Euphemism Bot. All these bots have their standbys: common constructs they return to over and over. Then they have three more tiers of constructs where the result is aesthetically riskier, or the joke is less likely to land, or a little of that construct goes a long way.

I also use the WMT in A Dull Bot to a more subtle purpose. Each tweet contains a random number of typos, and each typo is chosen from a WMT. One of the common typos is to transpose two letters. A very rare typo is to uppercase one word while leaving the rest of the sentence alone.

The WMT fixes one of the common aesthetic problems with bots, where every output is randomly generated but it gets dull quickly because the presentation is always the same. Since you can always dump more stuff into a WMT, it's an easy way to keep your bot's output fresh. In particular, whenever I get an idea like emoji mosaics, I can add it to Smooth Unicode's WMT instead of creating a whole new bot.

There's a Python implementation of a Wandering Monster Table in olipy.