Comment Spammers 1, Me 0

By Eric Richardson
Published: Sunday, August 07, 2005, at 10:40PM

I don't like interrupting for technical stuff here, but I have to let you know that the comment spam that I'm sure you see over in the recent comments is about to drive me crazy. It's been especially steady over the past few days -- I've knocked off 600+ comments in the last three. I'm going to keep knocking it down as soon as I can see it, and I'm going to work on implementing some sort of technical means of fighting it, but in the mean time just bear with me.

If you care about the tech bit of things, I'll discuss that after the jump.

The spams are coming from a wide range of IP addresses, so I don't think any sort of IP filtering is going to cut it. I've thought about just doing simple wordlist censorship, but that feels like a very inelegant approach to me.

My current thought is some sort of a Bayesian filter that'll get injected into the comment system and score what status the message should get set with. I've been just giving comment spam a status of 0 over the past few days (invisible, as opposed to the normal 1). That means I have a base set of 200 good comments and 600 spam comments to train a filter with. Since I'm not running something off the shelf like MT or Wordpress there's no just plugging in a pre-built system, but my code's fairly straight-forward, so it shouldn't be that hard to hook into something and use it for scoring.

That's a task for tomorrow, though. Right now I'm off to enjoy my bed.


