[Comments] (5) : The licensing stuff has to wait a little bit, because today I discovered that Movable Type has a data dump format! Well, you know how I love importing entries from other weblog tools. So I had to add a plugin to NewsBruiser that eats up that format and turns it into NewsBruiser entries. Thanks to Josh and Anirvan for sending me example dump files for me to work with. Today's new library is Transfusion, which parses the not-so-great MT data dump files into something you can use.

Incidentally, I've noticed a worrying difference between email spam and weblog spam. With email spam, the spammer can maybe approximate your spam list (after all, even spammers get spam), but they can't see your non-spam list without cracking your email account. On the other hand, with weblog spam no one sees the spam you deleted, but all the non-spam comments are right there for everyone to see and parse. It would be trivial for a comment spammer to post an exact repeat of someone else's comment but with Viagra links all over the place. In the long run, a Bayesean filter for comment spam might degenerate into an IP and URL blacklist. Is there any algorithmic defense against an attacker who has access to everything previously blessed by the algorithm?

Posted by Jeremy Penner at Tue May 18 2004 02:28

First thing that popped into my head -- run a Bayesean filter on the pages linked in the comments, as well? I mean, I suppose the spammer could link to well-behaved pages with no pop-up porn ads and everything spelled correctly, but I imagine that's kind of like losing, for spammers.

Posted by Leonard at Tue May 18 2004 03:14

Good idea! I remember this being talked about in conjunction with email spam filtering, but consensus was that it would create a DDOS on the hosting site. The idea was dropped rather than give attackers against an innocent website a powerful new DDOS tool. Weblog spam is orders of magnitude slower to put up than email spam, and weblog software has a much bigger ecosystem than spam filter software, so I think there could be room for a tool that does this sort of checking (especially since people are talking about weblog programs that *archive* every link ever mentioned).

The only problem I can see is the presence of an obvious countermeasure: put up an inoccuous site, then five minutes after the weblog spam change it to a Viagra-fest.

Posted by josh at Wed May 19 2004 21:20

Posted by Leonard at Thu May 20 2004 01:15

How about this: *compared to me*, you are obsessed with sports. :)

Thanks a lot for your MT help, by the way.

Posted by josh at Thu May 20 2004 11:44

hah! fair enough.

congrats on all of the NB development! I think I'm going to try it out again soon.


