Big talk, but fortunately I didn't have to come up with a better
algorithm, because I already had. Back in 2008 I released a project
called "Spurious",
which generates new Shakespearean sonnets by picking lines from the
existing sonnets. It generates two sonnets at once using two different algorithms. Algorithm B (the one
lower down on that page) is totally random: you could get a new sonnet
made entirely of the first lines of other sonnets. But Algorithm A
(the first one on that page) creates what I'm calling a Queneau
assembly. The first line of a new sonnet is the first line of some
existing sonnet. The second line is the second line of some other
sonnet. The third line comes from the set of third lines, and so on to
the end.
Oulipo founder Raymond Queneau did something very similar in his 1961 book "Hundred
Thousand Billion Poems". This may be where I got the algorithm
I used in "Spurious", though I don't think it was a conscious
homage. In "Hundred Thousand Billion Poems" there are ten sonnets
bound such that you can "turn the page" for a single line of the
sonnet, changing that line while leaving the rest of the poem
intact. Each generated poem feels like a sonnet because it
starts with a "first line" and ends with a "last line" and every line
in between is placed where it was in some manually generated sonnet.
I've named the technique in honor of Queneau because I can't find
anyone who used it earlier. It's not universally better than a Markov
chain, because it only works in certain cases: That said, the Queneau assembly gives very entertaining results, and it's
now my go-to dada technique, promoted over Markov chains and even
unadulterated randomness. The simple algorithm I've come up with a number of algorithms for making Queneau
assemblies. I'll talk about the simplest first, just so you'll see how
this works. This is a refined version of the algorithm I used for "Mashteroids" (yes, those asteroid
descriptions were me reinventing Queneau assemblies). It's not the
algorithm I used for "Board Game Dadaist"; I'll talk about that later.
You've got a body of N texts, T0, T1, ...,
TN-1. Each text can be split into some number of chunks,
eg. T00, T01, ...,
T0M-1.
Split each text into chunks and assign each chunk to one of three
buckets. The first chunk from each text goes into the "first"
bucket. The last chunk from each text goes into the "last"
bucket". All the other chunks go into the "middle" bucket.
Also keep track of how text lengths are distributed: how likely it
is that a text consists of one chunk, how likely that it consists of
two chunks, and so on.
Now you're ready to assemble. Pick a length for your new text that
reflects the length distribution of the existing texts. Then pick a
chunk from the "first" bucket. If your target length is greater than
1, pick a chunk from the "last" bucket". If your target length is
greater than 2, pick chunks from the "middle" bucket intil you've got
enough. Concatenate the chunks first-middle-last, and you've got a
Queneau assembly!
Paragraphs made from sentences Now let's look at the scales on which you might create a Queneau
assembly. Outside of poetry, the paragraph is the Queneau assembly's
natural habitat. A pragraph has a flow to it, especially when you've
got something like a description of a board game or an asteroid that's
only one paragraph long.
You need to handle things like quotes and parentheses that open in
one sentence and don't close by the end of the sentence, or that close
without having opened. I wrote code for this in BGD but it doesn't
catch all the cases.
Phrases made from words In "Board Game Dadaist", the names of games are also Queneau
assemblies. Here the chunks are words. I take the first word from the
name of game A, the second word from the name of game B, and so on. So "Pirates! Denver" might come from "Pirates! Miniature Battles on the High Seas" and "Monopoly: Denver Broncos".
Quotes and parentheses are still problems, though it's not as
bad. The big problem I ran into was repeated words, and words like
"the" which are not allowed to end a game name. (The simple algorithm,
with its "last" bucket, prevents "the" from showing up last unless it
showed up as the last word of an existing game. In the algorithm I
used for "Board Game Dadaist", I had to special-case this.)
In general, Queneau assemblies will not create coherent English
sentences. Much as it pains me to admit, a Markov chain is better for
that. It works for board game titles because we allow titles a lot of
creative license, even up to the point of suspending the rules of
grammar. "Pirates! Denver" makes no sense as a sentence, but it's a perfectly good game title. Words made from letter-chunks Many games have single-word titles, eg. "Carcassonne". I wanted to
have single-word titles in BGD, but I didn't want to duplicate real
names. So I applied the Queneau assembly algorithm on the word level.
Here, the chunk is a run of letters that's all vowels or all
consonants. So "Carcassonne" would be split into the chunks ["C", "a",
"rc", "a", "ss", "o", "nn", e"]. I keep two sets of buckets, one for
vowels and one for consonants. If the first chunk was a vowel chunk,
the second chunk is a consonant chunk, and I alternate til I reach the
end.
This means that single-word BGD titles are almost never English words,
but they do capture the feel of those one-word titles that aren't
words (examples: "Zajekan", "Fraseda", "Kongin", "Q-blardo").
The BGD algorithm Now that you see how it works, I'll explain the algorithm I
actually use for "Board Game Dadaist". Instead of three buckets, I
have a lot of numbered buckets. When I split a text into
T00, T01, ...,
T0M-1, I put T00 into
bucket 0, T01 into bucket 1, and so on, with
T0M-1 going into bucket M-1. I create an
assembly by picking from bucket 0, then bucket 1, and so on until I've
reached the target length.
This is the algorithm that "Hundred Thousand Billion Poems"
uses, and when the texts have more structure than "beginning/middle/end", this algorithm works a lot better. I don't think it matters much for BGD descriptions, but I do think it matters for game names. I would like to combine this algorithm with the "last" bucket from the simple algorithm, because right now board game descriptions sometimes end abruptly with a sentence like "Contents:".
Thu Aug 18 2011 10:39 Queneau Assembly:
That's my name for the formerly-unnamed technique I used in "Board Game
Dadaist". It all started in April, the night I was guest critic
for Adam's ITP class. Afterwards I went out to dinner with Adam
and Rob, and Adam was talking up Markov chains. Dude loves him some
Markov chains. I said "Man, I'm tired of Markov chains. Markov chains are so 70s, they have little coke spoons dangling from them. I'm gonna come up with a better algorithm for creating generative text."
