Last week I started looking at ways to cut down on false positives in our spam filters. I’ve only seen two in my own mailbox this year, but of course everyone gets different kinds of email. I’ve been trolling the server logs for low-scoring “spam,” looking for anything that looks like it might be legit, particularly if the Bayes subsystem has already identified it correctly but isn’t enough to counteract the score assigned by other rules. (Unfortunately, it’s hard to tell when all you’ve got is the sender, subject, and list of spam rules.)

One item I noticed was a copy of the Microsoft Technet Flash newsletter. I thought this was odd, since I’d gotten a copy of the same newsletter and it hadn’t been labeled. In fact, it turned out that my copy only scored 0.3 points, and the other hit 6.4! (5 points indicates probable spam.) What could explain such a disparity?

Answer: two very small differences. Continue reading