< Boring Dream
Broken Tom Swifties >

[Comments] (6) Spam Will Eat Itself: The NewsBruiser wiki has been defaced with wiki spam for a while. Recently I decided to do something about it, so I examined the patterns and found some bizarre features unique to wiki spam.

First, the spam I was getting seemed to be manually entered. There were long (by web bot standards) and irregular pauses between hits from spammers, and slightly differing spam methodologies. Sometimes the log message for a spammy change had a solitary "d" or "df" in it, the mark of a person used to scribbling crap data into web forms. It was kind of sad to imagine the people whose job it is to manually spam wikis, like an email spammer who must manually type in SMTP commands. Maybe I'm missing all the bot-based attacks because I use a relatively unpopular wiki software (just as the only comment spam I've ever seen on NewsBruiser was manually entered)

What is new and interesting, though, is the way the wiki spammers locate wikis to spam. To get things started there must be some bots that can spam SubWiki, or particularly persistent manual wiki spammers. But once there is any spam at all in a wiki, a pheremone trail has been laid down and the hordes close in.

You see, the majority of manual wiki spammers seem to be free riders who use search engines to find wikis that have already been spammed, then go in and replace the preexisting spam with their spam. The internicene warfare rages without end, as spammers destroy each others' contributions to the wikispamosphere while making a mockery of the work of the original spammer who went through all the trouble of finding that wiki in the first place, or writing a bot that could spam SubWiki. For shame!

There are a variety of engines and sub-techniques in use, but the most common one is to search for "wiki" plus the name of a site to which wiki spam points. Then, for each hit, go into the wiki and replace the (spam) text of the page with your own spam text. This anti-wiki-spam organization has documented this behavior, but not its comical implication.

Until I can figure out a better solution which hopefully doesn't involve me doing a whole lot of SubWiki development or switching wiki software, I am going to do a little free riding of my own. I've implemented a couple solutions that protect the NewsBruiser wiki only because it's not worth five seconds of a spammer's time to figure out what's going on, when there are so many other wikis they could be spamming.

Filed under: , ,

Comments:

Posted by Brian Danger Hicks at Wed Dec 15 2004 16:58

I originally read that as anti-wiki spam organization, and was disappointed to find I was mistaken.

Posted by Brendan at Wed Dec 15 2004 18:10

Man, I thought I was being clever yesterday when I dreamed up the idea of writing wiki software that would run off Subversion. I need to do my homework better.

Posted by Manni at Thu Dec 16 2004 04:11

I guess I haven't documented the comical implications because I fail to see them. Not all of the spammers on the chongqed.org wiki removed the other spam; often they would just add their own junk.

But the solution is simple: Just prevent those search-engine spiders from indexing kept revisions.

The pattern that I have seen doesn't really comply to "spam will eat itself". Rather "spam will create spam". An analogy I really can't fence off would be a thermonuclear reaction. It really does seem to be a chain-reaction and it really does seem to cause catastrophic effects. Look at Lee's wiki: http://www.piclab.com/cgi-bin/wiki.pl?Recent_Changes or at the Know-how wiki: http://www2.iro.umontreal.ca/~paquetse/cgi-bin/wiki.cgi?Recent_Changes

Not so funny, is it?

Posted by Gary at Thu Dec 16 2004 06:50

For generic bot protection see this, but possibly that's not what you want. I suppose you could put a "Googlebot gets in" rule into it or something...

Posted by Leonard at Thu Dec 16 2004 09:17

Manni, I didn't mean to belittle the problem. All I've seen on my wiki is actual people fighting over who gets to put spam on two of my pages. I don't think spam will eat itself in general, but it is eating itself on my wiki. And even though it hurts my wiki a lot I do think *that* is kind of funny.

Gary, I don't want to block bots. Bots don't seem to be the problem. The problem seems to be people whose only distinguishing feature is that they come in from search engines. If I blocked bots I'd never get in the search engines, but that's not a great solution.

Posted by Leonard at Thu Dec 16 2004 09:21

Incidentally, subwiki doesn't display the details of previous versions, so they can't be indexed. Unfortunately it doesn't make it easier to revert to an earlier version, either.


[Main] [Edit]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.