News You Can Bruise for 2013December4 (entry 1)

@pony_strategies
Markov vs. Queneau: Sentence Assembly Smackdown

(2) Secrets of (peoples' responses to) @horse_ebooks—revealed!: As part of my @pony_strategies project (see previous post), I grabbed the 3200 most recent @horse_ebooks tweets via the Twitter API, and ran them through some simple analysis scripts to figure out how they were made and which linguistic features separated the popular ones from the unpopular.

This let me prove one of my hypotheses about the secret to _ebooks style comedy gold. I also disproved one of my hypotheses re: comedy gold, and came up with an improved hypotheses that works much better. Using these as heuristics I was able to make @pony_strategies come up with more of what humans consider the good stuff.

Timing

The timing of @horse_ebooks posts formed a normal distribution with mean of 3 hours and a standard deviation of 1 hour. Looking at ads alone, the situation was similar: a normal distribution with mean of 15 hours and standard deviation of 2 hours. This is pretty impressive consistency since Jacob Bakkila says he was posting @horse_ebooks tweets by hand. (No wonder he wanted to stop it!)

My setup is much different: I wrote a cheap scheduler that approximates a normal distribution and runs every fifteen minutes to see if it's time to post something.

Beyond this point, my analysis excludes the ads and focuses exclusively on the quotes. Nobody actually liked the ads.

Length

The median length of a @horse_ebooks quote is 50 characters. Quotes shorter than the median were significantly more popular, but very long quotes were also more popular than quotes in the middle of the distribution.

Capitalization

I think that title case quotes (e.g. "Demand Furniture") are funnier than others. Does the public agree? For each quote, I checked whether the last word of the quote was capitalized.

43% of @horse_ebooks quotes end with a capitalized word. The median number of retweets for those quotes was 310, versus 235 for quotes with an uncapitalized last word. The public agrees with me. Title-case tweets are a little less common, but significantly more popular.

The punchword

Since the last word of a joke is the most important, I decided to take a more detailed look each quote's last word. My favorite @horse_ebooks tweets are the ones that cut off in the middle of a sentence, so I anticipated that I would see a lot of quotes that ended with boring words like "the".

I applied part-of-speech tagging to the last word of each quote and grouped them together. Nouns were the most common by far, followed by verb of various kinds, determiners ("the", "this", "neither"), adjectives and adverbs.

I then sorted the list of parts of speech by the median number of retweets a @horse_ebooks quote got if it ended with that part of speech. Nouns and verbs were not only the most common, they were the most popular. (Median retweets for any kind of noun was over 300; verbs ranged from 191 retweets to 295, depending on the tense of the verb.) Adjectives underperformed relative to their frequency, except for comparative adjectives like "more", which overperformed.

I was right in thinking that quotes ending with a determiner or other boring word were very common, but they were also incredibly unpopular. The most popular among these were quotes that repeated gibberish over and over, e.g. "ORONGLY DGAGREE DISAGREE NO G G NO G G G G G G NO G G NEIEHER AGREE NOR DGAGREE O O O no O O no O O no O O no neither neither neither". A quote like "of events get you the" did very poorly. (By late-era @horse_ebooks standards, anyway.)

It's funny when you interrupt a noun

I pondered the mystery of the unpopular quotes and came up with a new hypothesis. People don't like interrupted sentences per se; they like interrupted noun phrases. Specifically, they like it when a noun phrase is truncated to a normal noun. Here are a few @horse_ebooks quotes that were extremely popular:

• Don t worry if you are not computer
• Don t feel stupid and doomed forever just because you failed on a science
• You constantly misplace your house
• I have completely eliminated your meal

Clearly "computer", "science", "house", "and "meal" were originally modifying some other noun, but when the sentence was truncated they became standalone nouns. Therefore, humor.

How can I test my hypothesis without access to the original texts from which @horse_ebooks takes its quotes? I don't have any automatic way to distinguish a truncated noun phrase from an ordinary noun. But I can see how many of the @horse_ebooks quotes end with a complete noun phrase. Then I can compare how well a quote does if it ends with a noun phrase, versus a noun that's not part of a noun phrase.

About 4.5% of the total @horse_ebooks quotes end in complete noun phrases. This is comparable to what I saw in the data I generated for @pony_strategies. I compared the popularity of quotes that ended in complete noun phrases, versus quotes that ended in standalone nouns.

Quote ends in Median number of retweets
Standalone noun 330
Noun phrase 260
Other 216

So a standalone noun does better than a noun phrase, which does better than a non-noun. This confirms my hypothesis that truncating a noun phrase makes a quote funnier when the truncated phrase is also a noun. But a quote that ends in a complete noun phrase will still be more popular than one that ends with anything other than a noun.

Conclusion

At the time I did this research, I had about 2.5 million potential quotes taken from the Project Gutenberg DVD. I was looking for ways to rank these quotes and whittle them down to, say, the top ten percent. I used the techniques that I mentioned in my previous post for this, but I also used quote length, capitalization, and punchword part-of-speech to rank the quotes. I also looked for quotes that ended in complete noun phrases, and if truncating the noun phrase left me with a noun, most of the time I would go ahead and truncate the phrase. (For variety's sake, I didn't do this all the time.)

This stuff is currently not in olipy; I ran my filters and raters on the much smaller dataset I'd acquired from the DVD. There's no reason why these things couldn't go into olipy as part of the `ebooks.py` module, but it's going to be a while. I shouldn't be making bots at all; I have to finish Situation Normal.

Filed under:

Posted by Franz Poekler at Thu Dec 05 2013 07:30

That's very interesting about interrupting noun phrases, I was noticing just yesterday while updating a ranking of the humorousness of words prior to integrating it with wordnet data that nouns are consistently funnier than other types of words.

My system doesn't use external feedback, it's just a kludge of various guesses involving scrabble scores, word length, frequency of appearance in this corpus or that one, etc.

You get surprisingly far just by looking for words that are spelled funny, because they're usually loanwords, which means they're usually words for things that are novel and/or specific, both of which make things funny. Bassoon, zucchini, umbrella, &c.

I have yet to rescore the words based on the average humor value of wordnet's various "bins", such as body parts, weather, abstractions, etc. but here's what I currently have for the top 500 funniest words:

Gorilla gorilla gorilla
raisin
reefer
roller
pinkie
banana
zombie
stereo
sensor
waffle
phlegm
Gorilla gorilla
weasel
pill roller
Mormon
cone pepper
cowboy boot
minion
midget
vacuum pump
lemon shark
cowboy
cherry pepper
mascot
pepperoni pizza
pepper
pretzel
kettle
lemming
quirk
rainbow
troll
dryer
cereal oat
pencil eraser
buddy-buddy
omega
panda
hippo
blotter
pancake
pencil
server
median
freezer
sleigh
oyster mushroom
Moron
haircut
vinegar
hamburger bun
ravioli
razor clam
gorilla
carrot
popcorn
Hessian boot
pillow
balloon
pickle
cannon
feedback loop
cramp
wedge
avatar
spice rack
gum boot
eyebrow pencil
chill
stirrup pump
acronym
atlas
wrench
cereal
bell pepper
worker bee
diorama
broccoli
shack
mustard
pizza
sleigh bell
pin wrench
breeze
puma
alpha
dinosaur
peppermint gum
nettle
shark
excerpt
oyster
lamb curry
snowball
vacuum
hiccup
hazard
threat
beefsteak begonia
furnace
theorem
loan shark
riddle
twist
banner
moccasin
fedora
puff
bedtime
Mormons
sticker
bouillon cube
razor
broom
tiger shark
maze
syndrome
pepper spray
pin cherry
pillow lava
barbecue
cocoa
melody
partridge pea
betel pepper
tank furnace
cocoa bean
delta
hailstone
margarita
croissant
crossword
spaceship
vacuum cleaner
hamburger
memo
grammar
pepper mill
hyacinth bean
chalk
pin oak
buttermilk pancake
ramp
cobweb
pill
sheet music
prank
Mormon Tabernacle
karma
hobo
brush kangaroo
blackjack oak
cannon cracker
jodhpur boot
mailman
abyssal zone
tack hammer
media guru
funk
fallout
umbrella magnolia
sheet
music rack
hyacinth
grocery store
boob
bump
yardstick
pseudonym
sheet pile
algorithm
oyster cracker
mechanic
arpeggio
crow
mustache
ayatollah
hallmark
sock
junk pile
muck
fudge
crescendo
beefsteak morel
mall
math
hedgehog
bluebell
weekend warrior
cone
disco biscuit
stiletto
tiger rattlesnake
lever
furbelow
stamp mill
mannequin
pepper root
pump
mandala
junk
shorts
plaintiff
boot
maths
rack
leech
lemon
mosquito net
pea bean
billow
scarecrow
teen
buttercup
pepper sauce
belt buckle
flashbulb
bio lab
kangaroo
reel
buddy
spark lever
mackerel shark
affidavit
honeycomb
stamp
nightclub
insomniac
knickknack
slack
diaphragm
nonsense
bunny
notebook
mosquito
teens
nerd
saxophone
snack
mustard sauce
tarantula
corkscrew
drumstick
boardwalk
riffraff
ditto
waistcoat
highchair
nincompoop
umbrella
economy
lollipop
mausoleum
bureaucrat
whirlpool
pine hyacinth
pepperoni
asparagus pea
sweetheart
cappuccino
putty
slacks
rapture
miasma
gangplank
barbecue sauce
asterisk
pocketbook
millennium
leprechaun
footage
utensil
girlfriend
guerrilla
sidekick
logarithm
beefsteak
limbo
zucchini
cafeteria
dye
fudge sauce
luggage rack
dome
enthusiasm
burner
withdrawal symptom
sweat sock
loop
archbishop
umbrella pine
gymnasium
peninsula
laundry detergent
hydrangea
pilot balloon
fridge
teeter-totter
birthmark
font
wink
multivitamin pill
disco music
kidney begonia
bedfellow
windowpane oyster
poinsettia
toothpick
bumblebee
chow chow
kangaroo mouse
myeloma
greenhouse
kerosene heater
bathroom cleaner
cheesecake
chestnut oak
oak chestnut
barbershop
leeway
espresso
maul oak
Hoover Dam
cherry
scratch sheet
sagebrush buttercup
Juggernaut
cinnamon bun
pizza parlor
tobacco mildew
cube root
windowsill
parody
wavelength
gestalt
tablecloth
quest
Hoover
rattlesnake root
shrimp sauce
earthquake
tattletale
bell morel
candelabra
wheelchair
curry sauce
lemon grove
italic
shenanigan
fellowship
portmanteau
mushroom poisoning
horseshoe
mechanics
peppermint
loan-blend
buttermilk biscuit
stack
worker
arrhythmia
crochet stitch
pilot burner
assassin
goo
notch
pin curl
gut
postmortem
phenomenon
backup
employee
shrimp
nanosecond
preview
wheelbarrow
ion pump
cannonball
grapefruit
Baptist
chinchilla
hummingbird
stacks
mushroom sauce
chukka boot
desktop
graffiti
waxwork
pterodactyl
mania
curry
buggy
medico
luggage carrier
ballroom music
grocery
feedback
guts
pinch
otitis media
wand
Gorilla gorilla beringei
Gorilla gorilla grauri
corncob
pixel
crackerjack
lory
dot
Mohammedan calendar
oat
alpha and omega
twilight zone
leek
kidney bean
iceberg lettuce
dungaree
Yahoo
cashew
iceberg
hieroglyph
mallard
puff paste
spark coil
cephalopod mollusk
bookmark
upkeep
microwave oven
cottonmouth moccasin
gecko
genome
cube
hessian
cakewalk
nap
aristocrat
pinwheel
tag
autumn pumpkin
logbook
flamenco
bridesmaid
bee
firework
purl stitch
llama
caterpillar
asparagus bean
eraser
flowerpot
lemon mint
rattlesnake
oak
teat
yokel
opossum shrimp
representative sample
stamp album
zigzag goldenrod
storage tank
marshmallow
pea
Galilean satellite
sump pump
lab
palooka
picket
bun
nihilism
glockenspiel
omelette pan
stopgap
rim
epitaph
neutrino
bylaw
extravaganza
spice
satellite
optometrist
cue
drum
sneak preview
buns
lowbrow
pistachio
strawberry daiquiri
vanilla bean
gum
flake
mushroom
molehill
basilisk
throw pillow
belt
pan-fry
clam
pointer
plywood
spaghetti sauce
sidewalk
vest
loop-the-loop
screw wrench
housetop
volleyball net
derriere
lamb
loop gain
pilot biscuit
mitten
bubble gum
ore
ellipsis
pockmark
melanoma
bedroom furniture
Frisbee
neighborhood
refrigerator
pinecone
ego
robot pilot
carrier
sausage balloon
frill
employee ownership
rogue
stirrup
tonic epilepsy
clams
matzo
folk music
vaquero
rhododendron
flattop
sausage pizza
jaguar
pushup
windmill
abyssal
mousetrap
doorstep
horseshoes
tear sheet
messenger
breakthrough
stem lettuce
schnook
rack of lamb
frieze
massage parlor
heater
mantilla
threshold
ram
tangent
garbanzo
eyesore
vessel
kettle hole
outpost
crow pheasant
tack
hangout
corpus delicti
crochet needle
sandbar shark
lesser panda
nicotine poisoning
bandanna
nubbin
bouillon
boom
antenna
popper
eggnog
fatty
oyster shell
burger
dudgeon
wrinkle
escape cock
aerosol
cruise
wonton
novel
latch
laptop
chrysanthemum
bedrock
handicap
whisk broom
mensch
hypochondriac
washtub
cock
raven
fleck
serif
gimcrack
sheet bend
doorknob
Argonaut
pinprick
Calypso
tabletop
aftermath
kinkajou
laughingstock
spin dryer
dart
salvo
duel
hoosegow
wool
flashback
funk hole
dogwood
Cremona
pancake batter
pine grosbeak
yearbook
tomahawk
base
pepper-and-salt
oodles
partridge
click-clack
lunchroom
bilberry
banana split
amphibian
raspberry
scrub oak
tilt angle
sub
disco
clip
reservoir
textbook
batch
hamper
Savannah
bracero
broom-weed
ala
warrior
Ohm
obesity
scenario
Baptists
woodchuck
zigzag
kipper
northern pin oak
wildebeest
bonus
halloo
heartburn
begonia
goddess
bean sprout
pundit
marijuana cigarette
gazpacho
jeans
darts
knockoff
epidemic roseola
macaroni
lemon extract
hammer
kibbutz
mustard seed
puff batter
baseball
snowstorm
rose leek
seed oyster
racetrack
muskrat
eyebrow
toy
slice
staple
workshop
aviatrix
loofah
baseball mitt
jerkin
thunderstorm
bathroom
filbert
rigatoni
tonic
Halloween
mouse nest
centavo
zone
rogue elephant
enterprise zone
titbit
furlough
laundry
daiquiri
tierce
rainbow smelt
lever scale
tarragon
dewlap
onlooker
zwieback
session
eyetooth
suburb
bandwidth
narrator
screen font
goodness
scallywag
ducat
mouse click
Norse
kilowatt
cutout
barley
schmaltz
cybersex
rose gum
kink
weekend
gospel
tankard
poppet
doughnut
cluster bean
banana peel
mocha
vendetta
maharaja
Selene
virtuoso
rack railway
peek
filler
barracuda
blowup
haw-haw
microfilm
caraway
prairie rattlesnake
laser
sample
biscuit
clam dip
aftershock
restaurant attendant
attorney
symposium
bookshelf
labyrinth
hollyhock
instinct
pin
lamb chop
lamb-chop
grommet
wisecrack
massage
paprika sauce
mildew
hedgerow
Japanese banana
calico
yeshiva
heartbeat
photostat
hoot
melodrama
gooseneck barnacle
Japanese umbrella pine
jejunum
pine mouse
heirloom
classroom
cutoff
ethernet
anaconda
hammer throw
coil
pillow slip
gnocchi
fairy shrimp
nail-tailed kangaroo
pineapple
catnap
whetstone
hellcat
robot
pitchfork
marabou
keepsake
taenia
hiccup nut
bell
hamburger roll
claptrap
flapjack
blow dryer
checkbook
attendant
urban guerrilla
abomasum
push broom
hurricane lantern
hash
facelift
gutter
fan belt
stencil
ink eraser
demand loan
feminism
shiatsu
cobble
protocol
zygote
dredge
biconvex
neocortex
fallen
snap pea
pester
alter ego
cataclysm
cola
ballerina
tourniquet
imam
cancel
ephemera
duffel
deodorant
armchair
haberdashery store
avalanche
glowworm
sheepherder
Quechua
blackjack
trend
garbage pickup
Ashkenazi
sabbat
sauce vinaigrette
mesoderm
airplane mechanics
Japanese oyster
Gospels
scheme
atlas vertebra
cataract
handoff
waterfall
hindsight
waylay
leitmotif
pickle relish
rattlesnake weed
lockbox
steppe
pillow talk
touch base
wiggle nail
cesspool
seed shrimp
vanish
estoppel
sunspot
helper
jackknife clam
golliwog
stitch
opponent
cutthroat
pumpkin
salt lick
catchphrase
stilt
ballyhoo
pummel
mainsail
anagram
crosswalk
crossbow
marker
pillow fight
baklava
cornea
maharajah
hobgoblin
lever lock
diazepam
clause
pitch pine
symptom
goalpost
mezuzah
cookbook
poplin
exercise
maple
cockatoo
mestizo
neoprene
chipmunk
stem
urticaria
calendar
doohickey
decamp
Omega Centauri
galoot
riff
mollusk
hemisphere
cottonwood
lemon peel
leotard
teeter
swivel pin
redden
bordello
kumquat
solar furnace
toothache
withdrawal
glioma
salaam
banana quit
scrub palmetto
clench
lifespan
cuckoo-bumblebee
peanut
jonquil
lard
mishmash
reboot
beriberi
ocarina
Teapot Dome
ageratum
totter
metatarsal
oligarch
window oyster
kangaroo apple
pupa
bloodbath
spaghetti
chalk talk
tessera
bacon
Canal Zone
capuchin

Posted by Leonard at Sat Dec 14 2013 08:34

For posterity's sake, I'd like to record that there was a spam comment posted here whose text was "Wizard Who Was Scared Of men."

[Main] 

 Unless otherwise noted, all content licensed by Leonard Richardsonunder a Creative Commons License.