Thursday, April 15, 2010

Typepad Antispam

I've just set up Typepad's open-source Akismet backend, also known as "Serotype". This software internally uses Perlbal for communicating HTTP, Gearman to delegate instructions, and dspam for content-based spam filtering. Other software requirements are MySQL and memcached.

Documentation is pretty scarce; a README file is basically all you get. However, if you're familiar with Perl you should be good to go. I put the pieces together in a CentOS 5 VM. Many of the required Perl modules were already in EPEL, but I did have to get some things directly from CPAN.

Here are my initial thoughts:
  • Thank you TypePad for making this open source, and releasing it to the world!
  • Most of Typepad's software is in Perl, and they are the creators of Perlbal/Gearman, so no surprise that this software is based on that as well. Since it uses Gearman, this Serotype server should be able to scale massively.
  • Once I installed all the required Perl modules, the software essentially worked "out of the box". I did need to adjust the Gearman client timeout to fifteen seconds. I traced this delay to the yuidd daemon. I'm not sure why it can take up to ten seconds to give me a UID.
  • The handling of API keys is very loose; the web service accepts API key by default. However, only keys that are "blessed" are able to actually train the database.
  • I wish there were an easier way to "prepopulate" the database with spam.
Web forms are the spammers' new battlefields. Good thing the Akismet API even exists.