Trackback Spam, meet Akismet

Just recently I have been getting bombed with more trackback spam then I'd like. SUB has some built in protection for this by scanning the incoming exert for a link to the current post. The initial problem with this approach (as Al explained to me) is that other blogs such as wordpress will only submit the first X number of characters if its a long post, meaning that a link back to my post may not be in there and the trackback will fail.

That aside, in my case nothing was never going to work even for ham, this is because I've formatted my urls in a more friendly manner. If a trackback entry was going to use my new url format for the permalink, then the regex match wouldn't recognise it and it would fail.

I've been talking to Al a bit about this and he has suggested integration with Akismet. Akismet is a spam/ham checker that helps protect the wordpress community as well as having libraries in many other languages. The library I choose to use was this one at Joel.net.

So thanks to the always failing regex check, a spammer wouldn't have had a hope of ever successfully submitting to my site, now I've at least been alerted to the problem. I've completely replaced the regex check to rely solely on Akismet for checking valid entries. On top of this I've also added a check into the comments section of my posts, if Akismet finds a comment guiltly of spam the feedback item will be set as a draft item (coming soon).

As for the trackback spam, I'm already sick of it, and at this stage I can't be bothered to write a better spam management system in SUB, so if Akismet says no, trackback ignored, end of story. I really like the idea of trackbacks, but it's one of those things with good intentions that's just too easy to exploit, if only trackbacks could have some kind of Captcha verification as well.

General .NET
Posted by: Brendan Kowitz
Last revised: 21 Sep 2013 12:15PM

Comments

8/14/2006 1:01:30 AM
Verifying an incoming trackback is an interesting idea.

What about if you added some smarts to the trackback verification process. When the trackback lands, you generate a list of keywords for the destination URL. You pull down the content on the origin URL and generate a list of keywords. If you get over an x% collision/match on the keywords, then you allow it through. You could also add in a simple check where by the origin URL -must- include a link to your destination URL. That last point alone would stop a lot of the crap I think.

Al.

No new comments are allowed on this post.