WordPress, Mollom and Me

Back in the mid-naughties, blogs came under heavy attack from comment spammers. It was a blog authors’ nightmare. Before long, your well gardened blog would become filled with comments promoting enhancing medicine and what-not. Spammers use the generated links to get a higher indexing rating from searchbots in an effort to maximize their margins. At the time, there were only a few solutions like primitive CAPTCHA’s and their ilk. Then, Automattic introduced Akismet. This centralized service analyses content and tells you whether it’s spam or not. Soon, WordPress core shipped with an Akismet plugin.

Although Akismet did stop spam from hitting the frontpage of my blog, you still had to cleanse the moderation queue on a daily basis. Not something I really enjoyed doing. I yearned for those very early blogging days when I didn’t have to worry about comment spam.

Then, in 2008, Dries Buytaert introduced us to Mollom. This is a service Akismet alike, but with a twist: it combines text analysis and CAPTCHA’s in an elegant solution. The base proposition: content can’t necessarily be classified as either ham or spam, there’s a grey zone. Form input is sent to Mollom for scrutinization. If a clear distinction can be made, the system will inform the client whether the content is spam or ham. Mollom will respond with a CAPTCHA when unsure how to deal with the submitted content. It’s up to the visitor (commenter,…) to prove that they are not an automated spambot. As the system receives feedback during each step of the process, it’s able to improve itself over time as more content is processed.

Being fed up with spam, this looked like something very promising! When Mollom went public beta, there was only a basic Drupal module available. In my enthusiasm, I started reverse engineering the module and the API. I wanted the same spam stopping power for WordPress. I didn’t know how much work it would take to build something similar and I certainly hadn’t any experience building something of the same scale. I was just a rookie hobbyist programmer with a serious itch. But oftentimes, that’s all it takes to build something great.

After a week or so browsing code and reading the white paper again and again, I got stuck: the code and the service just weren’t documented enough to make sense for a third party developer. That’s when I e-mailed Dries. To my own surprise (I didn’t know him back them), he was very cooperative and with his first response, he sent me a draft of the entire API specification. That kind of openness was a great motivator to continue working on the plugin. Over the next year, I build, tweaked and released new versions and prototypes in an iterative process. The best part was that as I worked on the plugin, I was able to provide feedback to Mollom. Mentioning where the documentation was off, or I received different results then what I expected allowed them to improve their service. So development was actually a two way process.

Then development stalled after several releases.

I moved on to other interests (Drupal!), went through several personal challenges (moving about a lot) and lost interest in blogging for a good while. The company went on to expand its’ architecture, enhance its’ API’s and improve the Drupal module. I didn’t have the time to catch up on all that. And finally, the plugin itself became an unmaintainable mess of code. Things didn’t look that well.

In late 2011, I picked up development again. The API got implemented in a separate PHP class. Eventually, I decided to do a complete rewrite of the plugin. Development had become my day job by then, which meant that I could apply my new found knowledge and experience in this project. During 2012, I silently moved forward. I ported more functionality from the Drupal module to the WordPress plugin.

Developing in a different architecture comes with a lot of mismatch problems. One of the hardest things is that functionality in WordPress is implemented in a less abstract and componentized way as it is in Drupal. WordPress, after all, is (still) less an application framework. First and foremost, it’s a publishing platform. One of the major problems is the lack of a centralised Form API which plugins can implement. This means that I can only go so far as to protect the comment form implementing specific hooks in WordPress’ comment system. It also means that I had to write a HMAC based authentication component to prevent replay attacks of the CAPTCHA form, a feature included in Drupal’s Form API.</p>

As we've moved into the new year, so does development of the plugin over at Github. The base system is there and provides the same mechanics as the previous versions of the WordPress plugin did. My next goal is to iron out all major bugs so I can start alpha testing on my own personal blog. That's the first real test of the plugin. There's still a lot that needs to be done. A few major features are waiting to be implemented, and, of course, a lot of polishing. I'm quite optimistic though. If I can keep up with the current rate of progress, I can actually see this completely through. If only I knew what I was getting myself involved with back in 2008. :-)