Fighting Blog Comments Spam: Learn from Email

Even though I have been out of the ASRG for two weeks, the ever-increasing number of spam appearing in my blog comments have gotten me thinking. After some crusing and searching on Google, I read through some of the proposed and currently used solutions. What is interesting to me is how many of them are evolving in the same way solutions for fighting email spam have, and may theoretically have the same faults. It is also interesting to see people arguing the same causes for blog spam as email spam.

Similarities between Email and Blog Spam

The problem with blog spam is very similar to email spam and IM spam (“spim”) in many ways. As I pointed out in my unpublished draft on spam there are many causes for this problem: lack of trust, economics, social, etc. Many of these are the same for blog spam as well. This has lead some to suggest that addressing any of those causes will stop the problem. For example, adding e-postage, adding authentication, etc. all have been suggested for email. For many many many reasons these do not work (interested parties can consult the ASRG archives, John Levine’s excellent set of papers on some approaches, and of course Vern’s FUSSP list).

Many of them same approaches that have been recommended for email, are also being recommended for blog spam as well. We got URL filtering, reverse Turing tests (“CAPTCHAs”), blacklists, distributed reputation systems similar to DCC and Cloudmark, as well as C/R, rate limiting (like throttling in Movable Type), etc. There are even suggestions to stop using comments all together or replacing comments with a different system such as trackbacks, etc.

Many of these have been found to ineffective in email: filtering is not foolproof, turning off comments solves the problem at the expense of significantly reduced functionality, proposed replacements for email might not be any better, Turing tests cause problems for the disabled (although there is doubt whether blog spammers are machines and there are always tricks with free porn sites to get around CAPTCHAs).

Authentication in Email and Blog Comments

Unfortunatly, a lot of the stuff being proposed for blog spam is sometimes repeating the same mistakes as email spam solutions which might not be so clear to people outside the email world. For example, the need for authentication, reputation and accreditation services for email services have been arguend many times in the email world. As for authentication, currently, SPF and its siblings in IETF’s MARID WG are working on some form of authentication with Yahoo’s DomainKeys in the works as well. Reputation systems in the email world such as blacklists and whitelists, and others, are beginning to be organized. Most existing ones use one interface and multiple lists exist. Accreditation services for email do not exist on wide basis (although there is BondedSender), but they have been proposed in a decentralized fashion.

By comparison, in the blog world authentication tends to be either completely decentralized on a blog-by-blog basis or very cetralized such as TypeKey, which raises privacy and centralization concerns. Expeirence with the same type of developments in the email world, points to a need for creation of a set of standard protocols for this type of stuff which would allow for multiple authentication/reputation/accreditation services to exist.

The Bottom Line

While there are many similarities between the blog world and the email world, there are also differences. The key to reducing the problem in both is to observe the similarities and differences between the two, and then utilize the knowledge obtained in fighting spam in one area and apply it to other areas. There are many proposals for fighting email spam, many of which do not work but analyzing them and why they failed is the key to reducing the problem. If we can combine our knowledge in both blog and email areas, we can utilize some of the proposals for email in the blog world, and vice versa.

An example of this would be an open API for exchanging authentication, reputation and accreditation information between blogs and services, possibly as part of Atom. This directly corresponds to the standards efforts in the same area for email. There are numerous others which might have failed or successed in email, and can be used for blogs either because blogs are different or simply because they work.

Fighting Blog Comments Spam: Learn from Email

Further Reading

Response to a WSJ article on C/R

Impressions from the NIST spam workshop

Corporations vs. the Community: Is it really true?