Bad Behavior and Spam Karma do a good job of fighting most of the spam that hits this site, but over the last few weeks I’ve seen a (relatively) new kind that seems to require manual intervention: pingback spam.

It took a long time for spammers to really start abusing pingbacks, because of two things: First, pingbacks require the remote site to link to your site before they can get you to link to theirs. Second, it was just so much easier to abuse trackbacks and ordinary comments. I guess those have gotten locked down enough that it’s worth the effort to target pingbacks now. Continue reading

Judging by a quartet of comments posted this evening, 3 of which slipped past Spam Karma, someone’s started outsourcing comment spam to India. (I’m serious, the IP addresses were assigned to Bharti Airtel and BSNL Internet, both ISPs based in New Delhi.)

They were posted quickly, as if they’d been composed in another editor and pasted into the form. More importantly, they were actually posted through the form, not just sending data directly to the handler. And most tellingly, the posters had gone to the effort to fill out the CAPTCHA that Spam Karma provides to allow human commenters to recover from a false positive.

The one I liked best, from a technical perspective, was posted on Tall Ships of San Diego. The spammer had followed my link to the San Diego Maritime Museum, then followed that to a page describing one of the ships, the Californian, and generated a post by stringing together sentences from that page. The whole thing linked to a student loan site.

At first glance, it looked like a garbled, on-topic comment from someone who maybe didn’t speak English as their first language. That happens, and if it’s a legit comment, I leave it. In fact, I considered leaving the comment but deleting the author URL, until I looked up the ship. (It wasn’t one of the ships we toured on our visit, and I didn’t recognize the name.) As I looked at the ship’s profile, I started recognizing text from the comment. At that point it became clear what was going on, and I started looking at the other comments posted over the last few hours.

I’m surprised it took so long, but trackback spammers seem to have finally figured out that they can sail past the simplest check against trackback spam—does the calling page actually link to the page being trackbacked?–by temporarily adding that link.

Or maybe they have for a while, and they’ve only just started getting past my other layers of defense (namely Bad Behavior and other checks by Spam Karma).

*sigh*

I’ve held off on posting funny spam subject lines lately, but I just had to comment on this pair. First up:

Mazrim Taim was one of those, raising an army and ravaging Saldaea before he was taken.

It’s a quote from Lord of Chaos, the 6th book in Robert Jordan’s fantasy series, The Wheel of Time. The next one is a bit less obvious:

If Lan was attempting jokes, however feeble and wrongheaded, he was changing.

I wasn’t sure about this one, since there must be other stories with characters named Lan, but Google Book Search found it in book 5, The Fires of Heaven.

I’ve seen lots of spam that used filler from The Wizard of Oz and other novels old enough to be in the public domain. Project Gutenberg and the like have been transcribing them, making free plain-text ebooks for years, making it easy to snag a couple of lines of actual English text.

In theory this should be harder to identify as filler than randomly-generated text. Continue reading

Since adding the MSRBL-Images signatures to our spam filters at work, I’ve occasionally dropped in to Spam or Not to help rate their submissions. It uses the “Hot or Not” concept, but instead displays an image that’s been submitted as spam, and asks viewers to rate just how spammy it is. The results feed back into developing their signatures.

Right now they’re just 10 images away from rating every single image in their database.

Total Images: 308780
Total Ratings: 314616
Rated Images: 308770 (99.99%)

Unfortunately, I seem to be mostly getting already-ranked images, because that third number isn’t climbing in step with the second. And of course, when it comes to spam, you can rate all you want—they’ll make more.

I recently stumbled across an archived mailing list post of mine from the days before spammers started targeting WordPress. Someone had remarked that their spam problem had disappeared when they switched from Movable Type to WordPress, and I responded:

Oh, they hit us WordPress users too, just not as often as MT. Having it automatically moderate comments with certain keywords or more than X number of links helps cut it down, and the ability to (a) see all the latest comments and (b) mass-delete comments reduces the pain of cleanup. But they do target WP blogs from time to time.

I tend to get a pair of comments sent to the moderation queue every few weeks (presumably they figure if the first two didn’t show up, they won’t waste their time with more), but just this morning I had to delete a spam comment that came in last night and didn’t trip the moderation rules. (One of those with the generic “I like your site” messages and the author’s URL being the spamvertized site.)

That was September 2004. How things have changed! All WordPress blogs come with Akismet as an anti-spam measure, but I still prefer to use Bad Behavior, which has blocked ~2900 hits to this site in the past week alone, and Spam Karma, which has collected over 17,000 comment spams.

And with all those counter-measures in place, I get a couple of comments landing in the moderation queue each week. And just this morning I had to delete a spam comment that came in last night and didn’t trip either layer of defense (it was a generic piece targeting keywords found in a post). The filters are just barely keeping pace with the increased volume.

Project Honeypot recently started tracking comment spammers as well as email harvesting bots. Oddly enough, even though they have data going back to March 22, and even though Bad Behavior and Spam Karma have blocked an incredible number of spam comments on this site (Bad Behavior has blocked 3807 connections in the past week alone)…none of the honeypots I manage have trapped a single comment spam.

And no, the honeypot on this site isn’t protected by those plugins.