<M <Y
Y> M>

[Comments] (7) Constellation Games: I've sold my first novel, Constellation Games, to Candlemark & Gleam. (Here's the official announcement.) Starting in November, the novel will be serialized online, one chapter a week, with a collected book to follow later.

I cannot stress this enough: you should subscribe to this book. I pulled out all the stops to make Constellation Games the most fun-to-read thing I could write. It's a near-future space opera set in Austin, Texas. It's got everything I've ever wanted/loved to see in a sci-fi epic: zero-gravity sex, hive minds, terraforming, paleontology, fine art, warps in space-time, existential horror, and shipping containers.

But most of all, it's got video games. My goal was to write the most compelling fictional treatment of not only gaming culture (which already gets a lot of attention in spec-fic) but the creative act of making a game, how our society treats software and games as artifacts, and how other cultures might do things differently. Think of "Mallory", except in the context of what the Candlemark editor flatteringly calls "one of the most stunningly plausible first-contact novels out there."

I'm very happy to have Candlemark & Gleam as my publisher. I showed Constellation Games to some traditional-print-publishing people, and they really liked the story but didn't like the blog-heavy format. C&G immediately got on board with the format, and thanks to the serialization model, we can present the story in a way that conveys the feeling of living through a huge societal disruption. In case you don't get enough of that already.

Constellation Games Progress Report: Probably going to move this onto Twitter because a) I really need experience writing for Twitter, for reasons which will become clear, and b) that may make it feel less like I'm working alone. But here's the intro.

As July was Month of Kickstarter around here, so is August the Month of Constellation Games Rewrites. I've long been interested in, and somewhat disturbed by, the relationship between different drafts of a work. I think it'll be interesting to create a record of the rewrite process, and a fun challenge to do it without giving away huge spoilers or saying stuff that only makes sense once you've read the book. With that in mind, I have four main tasks for this month:

  1. Write two new scenes: one near the beginning, one near the end.
  2. Merge two of the minor characters into a single entity. They're kind of similar, so this will make the novel tighter. I'd been considering this as a nice-to-have for a while, but turns out one of those characters needs a lot more screen time near the beginning. The quickest solution is to just give him someone else's scenes.
  3. Use the additional screen time from #2 to foreshadow the appearance of a third character, who doesn't really exist until about halfway through the book.
  4. There are a lot of blog posts in the story that read more like normal prose. Kate (my editor) and I agreed to tweak the framing device a little bit and just turn those chunks into normal prose. The big problem here is that weblogs are written in a very recent past tense. I need to convert some of the blog posts to a more distant past tense, as though you were hearing a story recounted later, without causing big tense dislocations when you encounter an authentic blog post from the period.

Today I finished one of the scenes for #1. #2 causes me serious cognitive dissonance. I need to tackle that next so I can stop thinking about both sets of characters.

[Comments] (1) Late Adopter: I started posting about my writing progress on identi.ca, mirrored to Twitter. This is two years after Sumana started using identi.ca, so you know I'm behind the curve.

Anyway, today at work I merged those two characters, and although there are still some rough spots to fix, it was easier than I'd feared. I don't even miss Bruce, which is confirmation for me that he needed to go. Bruce was kind of the novel's straight man, and that's not a position that needs filling. Any given scene needs a straight man, but it doesn't need to be the same person every time.

[Comments] (1) The MST3K-IMDB Effect, Quantified: Sometimes when I rewatch an MST3K episode I go to the movie's IMDB page to learn more about it. Inevitably I'm annoyed by the comments of people who give these movies one-star reviews solely on the basis of having watched an edited version on MST3K. But even greater than my annoyance is my desire to quantify the phenomenon. Today, I have quantified it.

What does being on MST3K do to a movie's IMDB rating? My best guess is that it knocks 2.9 stars off what what the rating would have been if the movie hadn't been on MST3K. But read on to see how I came up with that number, and why it depends on the director.

Disclaimer

I am not a statistician. I'm not even a data scientist. I know how to get data out of the Internet. I know the difference between mean, median, mode, and standard deviation. And that's about it.

Example

Here's an example in case you're not familiar with the MST3K-IMDB effect, which there's no reason you should since that's a name I just made up for it. Consider "Speech: The Function of Gestures", a short film directed by Arthur H. Wolf. It's got 5 votes and an IMDB rating of 5.2. Now here's another short film in the same series, "Speech: Platform Posture and Appearance". Same director, same writer, same lead actor, but this film had the misfortune to be double-billed with Red Zone Cuba on MST3K. As a result, it's got 98 votes and an IMDB rating of 1.6.

Call me skeptical, but I've watched both films and I'm not convinced there's really a three-and-a-half star difference between them. Another film in the series, "Speech: Using Your Voice", was also featured on MST3K, but in a less memorable episode ("Earth vs. the Spider"), and it struggles along with an IMDB rating of 2.4.)

Methodology

Since the "Speech" films are part of a series, it makes sense to suppose that the difference between them is mostly due to the MST3K-IMDB effect. Of course, most films aren't part of a series. So I went by director instead. I picked up the filmography of every director who directed a film that was on MST3K. I split their films into two lists, "Normal" (not featured on MST3K) and "MST" (featured on MST3K). The "Normal" set only includes films that had enough IMDB votes to be given a rating. I included shorts and episodes of TV shows. This isn't perfect, because IMDB's plain-text data dump sometimes (but not always) gives a director's credit where their website gives a writer's credit. But it's close enough.

I took the average rating of the "Normal" list and the "MST" list. The difference between the two averages is how much it hurt that director to have one of their films featured on MST3K. As we'll see see, some directors were hurt a lot, and some of them shrugged it off, both for interesting reasons.

For the sake of comparison, the mean rating for a movie on IMDB at large is 6.4 stars, the median is 6.6 stars, and the standard deviation is 1.6 stars. However, "one star" is not a consistent unit of measurement. I'm considering redoing this table with normalized percentiles, but I'm not convinced there's a big demand for that, so for now you get stars.

Data

Here's a big table with the data for every director who had at at least five films in the "Normal" set and at least one in the "MST" set. Normalm and Normalstd are the mean and standard deviation for the IMDB ratings of that director's non-MST films, and Normaln is the sample size. MSTm, MSTstd, and MSTn are the same thing for the director's MST film(s).

Effect1 is what we're looking for: for a given director, how many stars does a film lose just from being on MST3K? But wait! What if the director made some good stuff and some bad stuff, and only the bad stuff ended up on MST3K? The MST3K set would have lower ratings, but it wouldn't be because of MST3K. That's where Effect2 comes in, and that's why the table is sorted by Effect2. I'll explain Effect2 after you get a look at the data.

Click here to skip the table.

Director Normalm Normalstd Normaln MSTm MSTstd MSTn Effect1 Effect2 MSTed Films
Bava, Mario 6.2 0.9 22 6.3 - 1 -0.1 -0.1 Diabolik (1968)
Francisci, Pietro 5.0 0.9 9 4.8 0.7 2 0.2 0.3 Ercole e la regina di Lidia (1959), Le fatiche di Ercole (1958)
Tucker, Phil (I) 3.1 0.9 6 2.8 - 1 0.3 0.3 Robot Monster (1953)
Steckler, Ray Dennis 3.1 1.3 22 2.1 - 1 1.0 0.8 The Incredibly Strange Creatures Who Stopped Living and Became Mixed-Up Zombies!!? (1964)
Rebane, Bill 2.7 0.7 8 2.1 0.6 2 0.6 0.8 Monster a-Go Go (1965), The Giant Spider Invasion (1975)
Burke, Martyn 5.8 1.5 7 4.4 - 1 1.4 0.9 The Last Chase (1981)
Sachs, William 4.3 1.1 9 3.2 - 1 1.1 0.9 The Incredible Melting Man (1977)
Warren, Jerry 2.4 0.6 8 1.8 - 1 0.6 0.9 The Wild World of Batwoman (1966)
Maetzig, Kurt 5.4 1.3 13 4.0 - 1 1.4 1.1 Der schweigende Stern (1960)
Buchanan, Larry 3.3 1.2 25 2.0 - 1 1.3 1.1 The Eye Creatures (1965) (TV)
Yuasa, Noriaki 5.0 1.6 12 3.1 - 1 1.9 1.1 Gamera tai daiakuju Giron (1969)
Gordon, Bert I. 4.2 0.8 13 3.2 0.6 8 1.0 1.2 Beginning of the End (1957), Earth vs. the Spider (1958), King Dinosaur (1955), The Amazing Colossal Man (1957), The Magic Sword (1962), Tormented (1960), Village of the Giants (1965), War of the Colossal Beast (1958)
Brannon, Fred C. 5.8 1.3 43 4.2 - 1 1.6 1.2 Radar Men from the Moon (1952)
Mikels, Ted V. 3.3 1.3 19 1.8 - 1 1.5 1.2 Girl in Gold Boots (1968)
Wood Jr., Edward D. 4.0 1.0 15 2.8 0.8 2 1.2 1.2 Bride of the Monster (1955), The Sinister Urge (1960)
Zarindast, Tony 4.2 2.0 10 1.7 - 1 2.5 1.2 Werewolf (1996) (V)
Bradley, David (I) 4.9 1.9 6 2.4 - 1 2.5 1.3 12 to the Moon (1960)
Ludwig, Edward 6.1 1.0 33 4.7 - 1 1.4 1.3 The Black Scorpion (1957)
Clark, Greydon (I) 3.5 1.2 19 1.9 0.0 2 1.6 1.3 Angels' Brigade (1979), Final Justice (1985)
Franco, Jesus 4.1 1.2 166 2.5 - 1 1.6 1.3 The Castle of Fu Manchu (1969)
Eason, B. Reeves 5.8 0.9 46 4.5 - 1 1.3 1.4 Undersea Kingdom (1936)
Pyun, Albert 4.5 1.5 43 2.5 - 1 2.0 1.4 Alien from L.A. (1988)
Sturges, John 6.5 0.7 42 5.5 - 1 1.0 1.4 Marooned (1969)
Neumann, Kurt (I) 6.2 0.9 51 4.8 - 1 1.4 1.4 Rocketship X-M (1950)
Rou, Aleksandr 6.8 1.5 14 4.5 - 1 2.3 1.5 Morozko (1965)
Zens, Will 4.4 1.8 7 1.6 - 1 2.8 1.6 The Starfighters (1964)
Corman, Roger 5.4 1.4 44 3.2 0.8 6 2.1 1.6 Gunslinger (1956), It Conquered the World (1956), Swamp Women (1956), Teenage Cave Man (1958), The Saga of the Viking Women and Their Voyage to the Waters of the Great Sea Serpent (1957), The Undead (1957)
Fukuda, Jun (I) 5.8 1.3 10 3.6 - 1 2.2 1.7 Gojira tai Megaro (1973)
Piquer Simón, Juan 3.5 1.0 11 1.8 - 1 1.7 1.7 Los nuevos extraterrestres (1983)
Crichton, Charles 6.5 1.7 58 3.6 - 1 2.9 1.7 Cosmic Princess (1982) (TV)
Portillo, Rafael (I) 4.7 1.6 10 1.9 - 1 2.8 1.7 La momia azteca contra el robot humano (1958)
Beaudine, William 6.1 1.2 128 4.1 - 1 2.0 1.7 Design for Dreaming (1956)
Peshak, Ted 3.7 0.7 14 2.5 0.1 2 1.2 1.7 Appreciating Your Parents (1950), What to Do on a Date (1951)
Conway, James L. (I) 7.2 1.2 74 5.0 - 1 2.2 1.8 Hangar 18 (1980)
Worth, David (II) 4.3 1.3 20 2.0 - 1 2.3 1.8 Warrior of the Lost World (1983)
Grefe, William 4.3 1.5 11 1.7 - 1 2.6 1.8 Wild Rebels (1967)
Shonteff, Lindsay 4.5 0.9 19 3.0 0.6 2 1.6 1.8 Devil Doll (1964), The Million Eyes of Sumuru (1967)
Kane, Joseph (I) 6.2 0.9 127 4.5 - 1 1.7 1.8 Undersea Kingdom (1936)
Yarbrough, Jean 6.5 1.7 87 3.4 - 1 3.1 1.8 The Brute Man (1946)
Hessler, Gordon 5.7 1.2 48 3.4 - 1 2.3 1.9 "The Master" (1984)
Kessler, Bruce 6.5 1.6 67 3.4 - 1 3.1 1.9 "The Master" (1984)
Lawrence, Quentin 7.1 1.4 12 4.4 - 1 2.7 1.9 The Trollenberg Terror (1958)
Fox, Wallace 6.1 1.1 31 4.0 - 1 2.1 2.0 The Corpse Vanishes (1942)
Kincaid, Tim (I) 6.5 2.2 25 2.0 - 1 4.5 2.0 Robot Holocaust (1986) (V)
Malatesta, Guido 4.3 1.4 14 1.5 - 1 2.8 2.1 Maciste contro i cacciatori di teste (1963)
Dein, Edward 6.1 1.0 7 4.1 - 1 2.0 2.1 The Leech Woman (1960)
Juran, Nathan 6.4 1.0 53 4.2 - 1 2.2 2.1 The Deadly Mantis (1957)
Baldanello, Gianfranco 4.8 1.3 9 2.0 - 1 2.8 2.1 Il raggio infernale (1967)
Beebe, Ford 6.1 0.7 50 4.6 - 1 1.5 2.1 The Phantom Creeps (1939)
Winters, David (I) 5.5 1.7 17 1.8 - 1 3.7 2.1 Space Mutiny (1988)
Harvey, Herk 5.1 1.1 22 2.7 0.4 3 2.4 2.2 Cheating (1952), What About Juvenile Delinquency? (1955), Why Study Industrial Arts? (1956)
Medak, Peter 6.7 1.4 63 3.6 - 1 3.1 2.2 Cosmic Princess (1982) (TV)
Corona Blake, Alfonso 5.6 1.3 13 2.7 - 1 2.9 2.2 Santo vs. las mujeres vampiro (1962)
D'Amato, Joe 4.7 1.3 141 1.9 - 1 2.8 2.2 Ator l'invincibile 2 (1984)
Pierce, Charles B. 4.9 1.3 10 1.9 - 1 3.0 2.2 The Barbaric Beast of Boggy Creek, Part II (1985)
Rich, David Lowell 6.6 1.2 99 3.8 - 1 2.8 2.3 SST: Death Flight (1977) (TV)
Strock, Herbert L. 6.1 1.5 23 2.6 - 1 3.5 2.3 The Crawling Hand (1963)
Katzin, Lee H. 6.6 1.3 71 3.6 - 1 3.0 2.3 The Stranger (1973) (TV)
Ulmer, Edgar G. 5.7 1.0 38 3.4 - 1 2.3 2.3 The Amazing Transparent Man (1960)
Sloane, Rick 3.4 0.7 14 1.7 - 1 1.7 2.3 Hobgoblins (1988)
Cardos, John 'Bud' 5.1 1.4 9 1.8 - 1 3.3 2.4 Outlaw of Gor (1989)
Castellari, Enzo G. 5.5 1.1 39 2.9 - 1 2.6 2.4 Fuga dal Bronx (1983)
Mahon, Barry 4.4 1.3 34 1.3 - 1 3.1 2.4 Rocket Attack U.S.A. (1961)
Vogel, Virgil W. 7.1 1.1 141 4.4 - 1 2.7 2.5 The Mole People (1956)
Giancola, David 3.9 0.7 7 2.0 - 1 1.9 2.5 Tangents (1994)
Francis, Freddie 5.7 1.1 30 3.0 - 1 2.7 2.6 The Deadly Bees (1967)
Baker, Roy Ward 7.0 1.2 89 3.7 - 1 3.3 2.7 Moon Zero Two (1969)
Nicol, Alex (I) 5.6 1.0 9 2.9 - 1 2.7 2.7 The Screaming Skull (1958)
Sears, Fred F. 6.1 1.0 45 3.3 - 1 2.8 2.7 Teen-Age Crime Wave (1955)
Rakoff, Alvin 6.5 1.2 25 3.3 - 1 3.2 2.8 City on Fire (1979)
Lieberman, Jeff (I) 5.7 0.7 9 3.9 - 1 1.8 2.8 Squirm (1976)
Cahn, Edward L. 5.5 1.0 106 2.8 - 1 2.7 2.8 The She-Creature (1956)
Arnold, Jack (I) 6.7 1.0 109 3.9 1.2 2 2.8 2.8 Revenge of the Creature (1955), The Space Children (1958)
Heyes, Douglas 7.6 1.1 42 4.4 - 1 3.2 2.8 Kitten with a Whip (1964)
Ferroni, Giorgio 5.3 1.1 16 2.0 - 1 3.3 3.0 New York chiama Superdrago (1966)
Fowler Jr., Gene 6.8 1.1 9 3.4 1.1 2 3.4 3.0 I Was a Teenage Werewolf (1957), The Rebel Set (1959)
Bava, Lamberto 5.4 1.1 30 2.1 - 1 3.3 3.0 Shark: Rosso nell'oceano (1984)
Wolf, Arthur H. 4.6 0.8 8 2.0 0.4 2 2.6 3.1 Speech: Platform Posture and Appearance (1949), Speech: Using Your Voice (1950)
Jameson, Jerry 6.1 1.2 109 2.5 0.6 2 3.6 3.1 Superdome (1978) (TV), The Bat People (1974)
Koch, Howard W. 6.3 1.3 20 2.4 - 1 3.9 3.1 Untamed Youth (1957)
Morse, Hollingsworth 6.6 1.3 105 2.5 0.6 2 4.0 3.1 Crash of Moons (1954) (TV), Manhunt in Space (1956) (TV)
Cottafavi, Vittorio 6.2 0.8 17 3.6 - 1 2.6 3.2 Ercole alla conquista di Atlantide (1961)
Moxey, John Llewellyn 6.7 1.2 110 2.9 - 1 3.8 3.2 "San Francisco International Airport" (1970) {San Francisco International (#1.0)}
McLaglen, Andrew V. 6.5 1.3 215 2.3 - 1 4.2 3.2 Mitchell (1975)
Trikonis, Gus 5.8 1.3 72 1.7 - 1 4.1 3.2 Five the Hard Way (1969)
De Martino, Alberto (I) 5.2 0.9 25 2.2 0.2 2 3.0 3.3 L'uomo puma (1980), OK Connery (1967)
Cardona, René (I) 5.6 1.0 45 2.0 - 1 3.6 3.4 Santa Claus (1959)
Miner, Allen H. 7.6 1.5 25 2.4 - 1 5.2 3.4 The Days of Our Years (1955)
Newfield, Sam (I) 5.5 0.9 145 2.4 0.5 4 3.2 3.5 I Accuse My Parents (1944), Lost Continent (1951), Radar Secret Service (1950), The Mad Monster (1942)
Sholem, Lee 6.7 1.4 44 1.9 - 1 4.8 3.5 Catalina Caper (1967)
Turner, Ken (I) 7.3 1.5 7 2.2 - 1 5.1 3.5 Revenge of the Mysterons from Mars (1981) (TV)
Haas, Charles F. 6.8 1.2 31 2.6 - 1 4.2 3.6 Girls Town (1959)
Kowalski, Bernard L. 6.9 1.1 80 2.9 0.2 2 4.0 3.7 Attack of the Giant Leeches (1959), Night of the Blood Beast (1958)
Fukasaku, Kinji 7.1 0.9 53 3.8 - 1 3.3 3.8 The Green Slime (1968)
Gentilomo, Giacomo 5.6 0.8 11 2.4 - 1 3.2 3.8 Maciste e la regina di Samar (1964)
McDougall, Don 7.2 1.4 158 1.7 - 1 5.5 3.8 Riding with Death (1976) (TV)
Webster, Nicholas 6.8 1.1 13 2.3 - 1 4.5 3.9 Santa Claus Conquers the Martians (1964)
Austin, Ray (I) 6.6 1.1 135 2.1 - 1 4.5 3.9 "The Master" (1984) {Hostages (#1.4)}
Oswald, Gerd 6.7 1.2 52 1.8 - 1 4.9 4.1 Agent for H.A.R.M. (1966)
Lynn, Robert (II) 5.5 0.8 12 2.2 - 1 3.3 4.4 Revenge of the Mysterons from Mars (1981) (TV)
Szwarc, Jeannot 7.3 1.0 167 2.5 - 1 4.8 4.6 Code Name: Diamond Head (1977) (TV)
Myerson, Alan 7.0 1.1 101 2.1 - 1 4.9 4.6 "The Master" (1984) {State of the Union (#1.3)}
Lipstadt, Aaron 7.2 1.1 73 2.1 - 1 5.1 4.7 City Limits (1984)
Rondeau, Charles R. 7.0 1.0 58 2.2 - 1 4.8 4.7 The Girl in Lovers Lane (1960)
Collins, Lewis D. 6.0 0.9 56 1.7 - 1 4.3 4.9 Jungle Goddess (1948)
Green, Alfred E. 6.4 0.7 73 2.4 - 1 4.0 5.4 Invasion USA (1952)
Levi, Alan J. 6.9 0.9 122 1.7 - 1 5.2 5.7 Riding with Death (1976) (TV)
Lane, David (I) 7.2 0.8 19 2.1 - 1 5.1 6.3 Invaders from the Deep (1981)
Greidanus, Tjardus 6.3 0.7 13 1.7 - 1 4.6 6.3 The Final Sacrifice (1990)
Saunders, Desmond (I) 6.7 0.7 14 2.1 - 1 4.6 6.5 Invaders from the Deep (1981)
Ptushko, Aleksandr 7.2 0.4 8 4.2 1.0 3 2.9 6.9 Ilya Muromets (1956), Sadko (1953), Sampo (1959)
Williams, Douglas (I) 8.5 0.7 6 2.1 - 1 6.4 9.6 Overdrawn at the Memory Bank (1983) (TV)
Morgan, William (I) 6.2 0.4 10 2.6 - 1 3.6 10.1 The Violent Years (1956)
Elliott, David (II) 6.7 0.3 9 2.1 - 1 4.6 14.1 Invaders from the Deep (1981)
Average 2.9

Analysis

Now, for the explanation of Effect2. From Normalstd we know how likely this director is to make a film that's substantially better or worse than their average. If they made one bad film that was on MST3K, and there was no MST3K-IMDB effect for that director, the rating for that film would most likely be within two standard deviations of the director's average. But if there were a strong MST3K-IMDB effect for that director, the rating for the MSTed film would be much lower than the director's other bad films. So, Effect2 is: how many standard deviations below Normalm is MSTm?

Let's look at the extremes of the list. First, the directors with very low Effect2:

And this is the big thing I learned doing the project: you can calculate the MST3K-IMDB effect, but you must also look at the director's average movie rating to see what it means. A low Effect2 just means that being on MST3K doesn't hurt a director's ratings very much. It doesn't say anything about the movie's quality.

OTOH, a director with a high Effect2 is probably worth a second look in a non-MST3K context.

And so on. The MST3K-IMDB effect is real--ninety percent of the directors in this table have an Effect2 of more than one standard deviation, and for sixty percent of them, it's more than two standard deviations. But it doesn't affect all directors equally.

Let's close out by taking a look at some of MST3K's favorite directors.

Conclusion

I'm still annoyed by those one-star reviews, but I understand them a little better now. When you watch, say, "The Function of Gestures", you enjoy it for its camp value, you have fun with it, and you give it a relatively good rating. But when you watch "Platform Posture and Appearance" or "Using Your Voice" on MST3K, you're watching someone else making fun of it, you have fun at its expense, and you give it a bad rating as a sign of solidarity with the MST3K characters.

Finally, I'd like to thank IMDB for, in a relic of its geeky past, making plain-text dumps of its data available. It's a strange feeling to have a file open in an Emacs buffer that lists nearly every movie ever made. (There are about 2 million, if you're curious.) Now that I have the data and scripts to process it, I may run other cinematic experiments in the future. One thing I would like to see added is IMDB links for the people and movies. It's a pain to look all these things up, which is why there aren't as many links in this post as you'd think.

[Comments] (2) Loaded Dice: Last month I downloaded a bunch of data from BoardGameGeek's web service for use in an art project. I'll be announcing the art project soon, but today I'm announcing "Loaded Dice", a data-mining project using the same data.

I've been writing scripts that analyze the BGG data and produce interesting charts and tables. I'll keep adding stuff to these pages until I get bored with this data. I've put up thirteen experiments so far. Here are some highlights:

Beautiful Soup 4 Beta: Now With Python 3 Support: The main thing holding back Beautiful Soup 4 from release was that it didn't work with Python 3. Fortunately, Thomas Kluyver stepped in and wrote some code, and now I can present the first BS4 beta release.

There's still some work to do, and it'll be a while before I get to it, but the work that remains is pretty minor compared to the advantages you get from using BS4 instead of BS3. Try out the beta, and if it gets good reviews I may just make a 4.0 release and deal with the minor things afterwards.

Board Game Dadaist: I mentioned when I announced "Loaded Dice" that I got all this BoardGameGeek data for an art project. I'm now announcing the art project, Board Game Dadaist. This page uses an as-yet-unnamed algorithm to mash up game titles, descriptions, and BGG comments into new, intriguing games like "Plastic Walls Are High", "Shopping - Destroy", and "Armchair to Hell". ("First driver to complete 3 laps is the winner. Capture them, brainwash them, throw them into your dungeon or consume them for spells.")

The BGD page is updated every 5 minutes with new games, and there's a daily RSS feed. Special thanks to Beth for the logo. (My logo looked awful.)

Queneau Assembly: That's my name for the formerly-unnamed technique I used in "Board Game Dadaist". It all started in April, the night I was guest critic for Adam's ITP class. Afterwards I went out to dinner with Adam and Rob, and Adam was talking up Markov chains. Dude loves him some Markov chains. I said "Man, I'm tired of Markov chains. Markov chains are so 70s, they have little coke spoons dangling from them. I'm gonna come up with a better algorithm for creating generative text."

Big talk, but fortunately I didn't have to come up with a better algorithm, because I already had. Back in 2008 I released a project called "Spurious", which generates new Shakespearean sonnets by picking lines from the existing sonnets. It generates two sonnets at once using two different algorithms. Algorithm B (the one lower down on that page) is totally random: you could get a new sonnet made entirely of the first lines of other sonnets. But Algorithm A (the first one on that page) creates what I'm calling a Queneau assembly. The first line of a new sonnet is the first line of some existing sonnet. The second line is the second line of some other sonnet. The third line comes from the set of third lines, and so on to the end.

Oulipo founder Raymond Queneau did something very similar in his 1961 book "Hundred Thousand Billion Poems". This may be where I got the algorithm I used in "Spurious", though I don't think it was a conscious homage. In "Hundred Thousand Billion Poems" there are ten sonnets bound such that you can "turn the page" for a single line of the sonnet, changing that line while leaving the rest of the poem intact. Each generated poem feels like a sonnet because it starts with a "first line" and ends with a "last line" and every line in between is placed where it was in some manually generated sonnet.

I've named the technique in honor of Queneau because I can't find anyone who used it earlier. It's not universally better than a Markov chain, because it only works in certain cases:

That said, the Queneau assembly gives very entertaining results, and it's now my go-to dada technique, promoted over Markov chains and even unadulterated randomness.

The simple algorithm

I've come up with a number of algorithms for making Queneau assemblies. I'll talk about the simplest first, just so you'll see how this works. This is a refined version of the algorithm I used for "Mashteroids" (yes, those asteroid descriptions were me reinventing Queneau assemblies). It's not the algorithm I used for "Board Game Dadaist"; I'll talk about that later.

You've got a body of N texts, T0, T1, ..., TN-1. Each text can be split into some number of chunks, eg. T00, T01, ..., T0M-1.

Split each text into chunks and assign each chunk to one of three buckets. The first chunk from each text goes into the "first" bucket. The last chunk from each text goes into the "last" bucket". All the other chunks go into the "middle" bucket.

Also keep track of how text lengths are distributed: how likely it is that a text consists of one chunk, how likely that it consists of two chunks, and so on.

Now you're ready to assemble. Pick a length for your new text that reflects the length distribution of the existing texts. Then pick a chunk from the "first" bucket. If your target length is greater than 1, pick a chunk from the "last" bucket". If your target length is greater than 2, pick chunks from the "middle" bucket intil you've got enough. Concatenate the chunks first-middle-last, and you've got a Queneau assembly!

Paragraphs made from sentences

Now let's look at the scales on which you might create a Queneau assembly. Outside of poetry, the paragraph is the Queneau assembly's natural habitat. A pragraph has a flow to it, especially when you've got something like a description of a board game or an asteroid that's only one paragraph long.

You need to handle things like quotes and parentheses that open in one sentence and don't close by the end of the sentence, or that close without having opened. I wrote code for this in BGD but it doesn't catch all the cases.

Phrases made from words

In "Board Game Dadaist", the names of games are also Queneau assemblies. Here the chunks are words. I take the first word from the name of game A, the second word from the name of game B, and so on. So "Pirates! Denver" might come from "Pirates! Miniature Battles on the High Seas" and "Monopoly: Denver Broncos".

Quotes and parentheses are still problems, though it's not as bad. The big problem I ran into was repeated words, and words like "the" which are not allowed to end a game name. (The simple algorithm, with its "last" bucket, prevents "the" from showing up last unless it showed up as the last word of an existing game. In the algorithm I used for "Board Game Dadaist", I had to special-case this.)

In general, Queneau assemblies will not create coherent English sentences. Much as it pains me to admit, a Markov chain is better for that. It works for board game titles because we allow titles a lot of creative license, even up to the point of suspending the rules of grammar. "Pirates! Denver" makes no sense as a sentence, but it's a perfectly good game title.

Words made from letter-chunks

Many games have single-word titles, eg. "Carcassonne". I wanted to have single-word titles in BGD, but I didn't want to duplicate real names. So I applied the Queneau assembly algorithm on the word level.

Here, the chunk is a run of letters that's all vowels or all consonants. So "Carcassonne" would be split into the chunks ["C", "a", "rc", "a", "ss", "o", "nn", e"]. I keep two sets of buckets, one for vowels and one for consonants. If the first chunk was a vowel chunk, the second chunk is a consonant chunk, and I alternate til I reach the end.

This means that single-word BGD titles are almost never English words, but they do capture the feel of those one-word titles that aren't words (examples: "Zajekan", "Fraseda", "Kongin", "Q-blardo").

The BGD algorithm

Now that you see how it works, I'll explain the algorithm I actually use for "Board Game Dadaist". Instead of three buckets, I have a lot of numbered buckets. When I split a text into T00, T01, ..., T0M-1, I put T00 into bucket 0, T01 into bucket 1, and so on, with T0M-1 going into bucket M-1. I create an assembly by picking from bucket 0, then bucket 1, and so on until I've reached the target length.

This is the algorithm that "Hundred Thousand Billion Poems" uses, and when the texts have more structure than "beginning/middle/end", this algorithm works a lot better. I don't think it matters much for BGD descriptions, but I do think it matters for game names. I would like to combine this algorithm with the "last" bucket from the simple algorithm, because right now board game descriptions sometimes end abruptly with a sentence like "Contents:".

Children Formed by Plants or Objects : I just discovered that you can search the USPTO's trademark registry by "design code", to find trademarks that use a certain graphical element. Furthermore, the design code classifications are fairly insane. Some choice quotes:

02.01.04 Religious figures, men wearing robes, shepherds, monks and priests

Excluding: Asian-Pacific men (02.01.11) and wizards (04.01.25) are not coded in this section.


02.01.32 Other men, including frogmen, men wearing space suits and men wearing monocles

02.03.25 Other women including hobos, women holding fans and women with weaponry

02.05.27 Other grotesque children including children formed by plants or objects

Mythological beings are cross-coded in the Human Category only when they are depicted as ordinary humans having no indicia of their mythological powers.

03.23.15 Micro-organisms (including sperms)

Only ladders which fold open to form a triangular profile are in 14.09.01. Other straight or extension ladders are in 14.09.02.

Clock radios are double coded in 16.01.03 and 17.01.02.

Concentric circles (26.01.17 and 26.01.18) and circles within a circle (26.01.20) are not considered three circles for purposes of coding.

[Comments] (1) "No Module Named BeautifulSoup": Since Beautiful Soup 4 is not backwards compatible with Beautiful Soup 3, I put it in a different module: bs4 instead of BeautifulSoup. In a non-ironic twist, the module rename has itself turned out to be the biggest compatibility problem between BS3 and BS4. The new module name has caused problems on several occasions where users thought BS4 worked just like BS3, or didn't even know they were using BS4 (1 2 3).

Why would you be using BS4 without knowing it? It's an unreleased beta. Well, that's happened before. When I made the BS4 alpha release, I put the tarball in /software/BeautifulSoup/download/4.x, and PyPI picked it up because it knows /software/BeautifulSoup/download/ is where I keep my tarballs. PyPI believed the 4.0 alpha to be the latest release of BeautifulSoup and started recommending it it to all and sundry, which was not what I wanted. So I moved the 4.x tarballs into a different directory that PyPI doesn't know about: /software/BeautifulSoup/unreleased/4.x/, and that solved the problem.

But now it was happening again. Some installation process or other was finding my /unreleased/ directory, picking up the beta tarball, and installing it by default as the latest version of Beautiful Soup. Why?

Thanks to the bug Brian Shumate filed today I tracked the problem down. It turns out the pip package-installer program scrapes the Beautiful Soup homepage (using regexes, not Beautiful Soup, ha). It looks for tarballs and picks the one with the biggest version number. So just by linking to the beta tarball and giving it a "4.0" name I was declaring 4.0 ready for prime time.

So I got a couple problems:

You Can't Be Serious: It's time for another big HTML table! This time I'm interested in movie connections. IMDB's dataset relates movies to each other using many different predicates: "edited into", "remake of", "alternate language version of", and so on. I'm interested in two of the most common predicates, "referenced in" and "spoofed in". Specifically, I want to answer these questions:

I think my table speaks for itself, but I'll give a legend above it and a little commentary below it. The table has two columns:

  1. The most spoofed movies and TV shows (by number of "spoofed in" references)
  2. The movies and TV shows most referenced in earnest (by number of "referenced in" references)

The little numbers are the counts of "spoofed in" or "referenced in" references for that movie or TV show.

A title in bold shows up on only one list. This doesn't mean that, for instance, "The X-Files" has never been spoofed, only that it's not spoofed enough to make it onto the "most spoofed" list. A title in italics shows up on both lists (or would, if I extended the lists a little bit), but it's in a much higher position on the "spoof" list (left column) or the "non-spoof" list (right column). If a title is neither bolded nor italicized, then it's in approximately the same position on the "spoof" and "non-spoof" lists.

At this point I should probably let the table do the talking, so here it is. If you hate data, you can skip the table.
Most SpoofedMost Referenced Seriously
1Star Wars 279Star Wars 1793
2The Wizard of Oz 199The Wizard of Oz 1397
3"Star Trek" 180"Star Trek" 1270
4The Godfather 155The Godfather 737
5The Matrix 148Psycho 693
62001: A Space Odyssey 141Casablanca 622
7Psycho 139Star Wars: Episode V - The Empire Strikes Back 587
8Raiders of the Lost Ark 134Jaws 585
9Jaws 120"The Simpsons" 573
10Star Wars: Episode V - The Empire Strikes Back 119Gone with the Wind 534
11The Exorcist 96King Kong 527
12King Kong 96The Terminator 485
13"Batman" 95E.T.: The Extra-Terrestrial 449
14Pulp Fiction 932001: A Space Odyssey 448
15Titanic 93"Sesame Street" 448
16Superman 89Raiders of the Lost Ark 440
17E.T.: The Extra-Terrestrial 86Apocalypse Now 422
18Apocalypse Now 85Frankenstein 379
19The Shining 84The Exorcist 374
20"The Twilight Zone" 83"The Twilight Zone" 366
21The Terminator 81"Saturday Night Live" 363
22Casablanca 75Scarface 361
23Jurassic Park 74Citizen Kane 358
24Frankenstein 72Pulp Fiction 350
25Taxi Driver 71Titanic 350
26Rocky 68The Shining 348
27Alien 68"Doctor Who" 348
28The Silence of the Lambs 67Alien 346
29The Blair Witch Project 61"The Oprah Winfrey Show" 344
30Il buono, il brutto, il cattivo. 59Taxi Driver 343
31Terminator 2: Judgment Day 59Ghost Busters 343
32The Graduate 58"The Flintstones" 334
33Ghost Busters 57Rocky 333
34Gone with the Wind 56Star Wars: Episode VI - Return of the Jedi 328
35Star Wars: Episode VI - Return of the Jedi 56The Matrix 326
36It's a Wonderful Life 54Back to the Future 323
37Forrest Gump 52The Silence of the Lambs 323
38Goldfinger 52"Batman" 317
39Back to the Future 52A Clockwork Orange 306
40Dr. No 49Terminator 2: Judgment Day 302
41Star Wars: Episode I - The Phantom Menace 49"Happy Days" 301
42Gojira 48Snow White and the Seven Dwarfs 298
43Dr. Jekyll and Mr. Hyde 48The Sound of Music 293
44Scarface 47A Nightmare on Elm Street 289
45Planet of the Apes 47Superman 285
46"Cops" 47"Gilligan's Island" 284
47"Scooby Doo, Where Are You!" 47Dracula 282
48Snow White and the Seven Dwarfs 46"Star Trek: The Next Generation" 276
49Dracula 45"The X Files" 275
50The Sound of Music 44"The Brady Bunch" 271
51Reservoir Dogs 44Dr. No 270
52Batman 44"I Love Lucy" 269
53Citizen Kane 43First Blood 268
54Goodfellas 43Night of the Living Dead 262
55Night of the Living Dead 43"American Idol: The Search for a Superstar" 261
56Carrie 43Gojira 259
57"Jeopardy!" 43Jurassic Park 258
58Saturday Night Fever 42Dirty Harry 257
59The Texas Chain Saw Massacre 41The Texas Chain Saw Massacre 254
60"American Idol: The Search for a Superstar" 40Vertigo 253
61Mary Poppins 40It's a Wonderful Life 253
62Full Metal Jacket 40Aliens 243
63Dirty Harry 40Planet of the Apes 243
64The Lord of the Rings: The Fellowship of the Ring 39Batman 239
65The Karate Kid 38Il buono, il brutto, il cattivo. 236
66"The Brady Bunch" 38"Scooby Doo, Where Are You!" 231
67Friday the 13th 37The Graduate 229
68RoboCop 37Goldfinger 227
69Risky Business 37Deliverance 226
70"I Love Lucy" 36The Lord of the Rings: The Fellowship of the Ring 221
71Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb 35Blade Runner 220
72"Star Trek: The Next Generation" 35Die Hard 218
73"The Tonight Show Starring Johnny Carson" 34"Jeopardy!" 215
74Close Encounters of the Third Kind 34"Seinfeld" 215
75"Baywatch" 33Rosemary's Baby 214
76Scream 33Star Wars: Episode I - The Phantom Menace 213
77Flashdance 33The Lion King 211
78Lady and the Tramp 33Saturday Night Fever 210
79First Blood 33Mary Poppins 206
80"The Oprah Winfrey Show" 32Bambi 205
81Willy Wonka & the Chocolate Factory 32The Karate Kid 202
82The Lion King 32"Friends" 201
83"The Flintstones" 32"The Tonight Show Starring Johnny Carson" 200
84Indiana Jones and the Temple of Doom 32Halloween 199
85"24" 32Reservoir Dogs 199
86"Mission: Impossible" 32Top Gun 198
87The Seven Year Itch 32Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb 196
88Halloween 32"The Muppet Show" 194
89Spider-Man 31"Buffy the Vampire Slayer" 193
90Patton 31"The Andy Griffith Show" 192
91Rain Man 31Forrest Gump 188
92Thriller 31Dawn of the Dead 188
93Singin' in the Rain 30Friday the 13th 188
94Aliens 30Jerry Maguire 187
95A Clockwork Orange 30Close Encounters of the Third Kind 186
96Monty Python and the Holy Grail 30Singin' in the Rain 185
97Grease 30"Dancing with the Stars" 180
98Deliverance 30West Side Story 177
99Mission: Impossible 30"Baywatch" 176
100"The Sopranos" 30Grease 175

I'm not terribly happy with this data. I suspect many "referenced in" references are actually spoofs, or are throwaway jokes that don't even rise to the level of "spoof". Are there really 179 non-spoof references to "The Lion King"? You know everyone's just riffing on the baby-lifting shot.

However, the reverse problem ("incorrectly regarded as spoofs") is nonexistent, so it's easy to spot things like The Blair Witch Project and "Cops" which only exist in our culture as things to make fun of; as well as things that are occasionally referenced seriously but much more frequently spoofed (The Matrix).

You Can't Be Serious: Addendum: I Should Be In That Spoof: After messing around with the IMDB movie connections for the original "You Can't Be Serious", I've decided to measure a movie's spoofability with a ratio instead of just counting the number of times it's been spoofed. Counting spoofs only measures the impact a movie has on our culture. Star Wars is the most-spoofed movie by far, but also the most-referenced movie by far. Measuring the ratio of spoofs to earnest references will find movies whose impact on culture was primarily to give us something to spoof.

(I came into this hating the word "spoof", BTW, and the more I type it the more I hate it.)

I calculated the spoof/reference ratio for all IMDB entries with more than one spoof and more than 5 references. Surprisingly, the movies with spoof/reference ratios near or above unity aren't movies; they're almost all TV shows:
MovieRatioSpoofsReferences
"Crocodile Hunter" 1.62138
"TMZ on TV" 1.56149
"The Undersea World of Jacques Cousteau" 1.43107
"The Twilight Zone" 1.43107
"Hardball with Chris Matthews" 1.402115
The Perils of Pauline 1.2997
"The McLaughlin Group" 1.1298
"Inside the Actors Studio" 1.071514
The Six Million Dollar Man 1.0088
Kids 1.0077
Der 90. Geburtstag oder Dinner for One 1.0077
Bigfoot 1.0066
"The Tomorrow Show" 1.001111
"Aquaman" 1.0066
Riverdance: The Show 0.911011
"Behind the Music" 0.911011
Uncle Tom's Cabin 0.90910
"Through the Keyhole" 0.8878
King's Quest: Quest for the Crown 0.8667
"The French Chef" 0.8356

I'll let you look up the ones you don't recognize, though I will say that "Der 90. Geburtstag oder Dinner for One" looks pretty great, and "Bigfoot" is exactly what you think it is: the one-minute 1967 film. (IMDB rating: 8.2!)

Calculating the average spoof/reference ratio is an iffy proposition, but for movies with a lot of references, it's around 0.15.

What movies have a very low ratio? Are there movies that are referenced, say, 100 times more often than they're spoofed? Once again, the question of what distinguishes a "reference" from a "spoof" rears its mediocre-looking head, but maybe it cancels out when we're calculating a ratio between the two. Let's find out.

MovieRatioSpoofsReferences
Sex, Lies, and Videotape 0.012138
Deep Throat 0.02292
"Little House on the Prairie" 0.02287
"Hogan's Heroes" 0.02283
Guess Who's Coming to Dinner 0.03277
Brazil 0.03273
The Searchers 0.03273
THX 1138 0.03271
"Green Acres" 0.03269
"Will & Grace" 0.03264
Vertigo 0.038253
Mr. Smith Goes to Washington 0.03263
Sleeping Beauty 0.03262
"Two and a Half Men" 0.03261
A Wild Hare 0.03260
The Way We Were 0.03260
To Kill a Mockingbird 0.03260
Tootsie 0.03259
"Captain Kangaroo" 0.03258
Sophie's Choice 0.03258

I was skeptical about this list, but upon investigation it's pretty good! Certainly better results than I got on Sunday. Sex, Lies, and Videotape the movie wasn't that influential, but it has one of the most influential titles in cinema history. Similarly for Mr. Smith Goes to Washington: lots of title references and lots of conspicuously visible movie posters. A Wild Hare was the origin of the phrase "What's up, Doc?". Guess Who's Coming to Dinner scores whenever a character wryly cracks that phrase at the end of a scene. And so on.

: On Monday Sumana and I went to the Socrates Sculpture Park to see "Odysseus at Hell Gate", a production that mashed up the Odyssey and New York City history and puppets. Little did we know that it was not a dramatic production in which the puppets would interact, but a "puppetscape" in which the puppets would wander around the park for all to marvel at. I'm not complaining, as the puppets were cool. I took pictures but they didn't turn out, so enjoy pictures from someone else who was there.

My favorite puppet by far was Robert Moses as Polyphemus, with the Unisphere for a head and a traffic light for an eye, operated by puppeteers in hard hats and surrounded by a flock of sheep-cars honking their bicycle horns. But in terms of puppeteering excellence, the shades were the best.

<M <Y
Y> M>

[Main]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.