I finally got out to see Transformers today. Yes, I grew up with the cartoons, the toys and the comics. Yes, I even collected every comic book from the original Marvel series through the Generation 2 series (including the prologue in G.I. Joe) through the first round from DreamWave. But somewhere along the line I just lost interest, and ultimately sold off my entire collection. (On eBay, actually.)

But still, there’s some sort of primal thrill—at least for anyone who grew up as a boy in 1980s America—in seeing giant robots fighting each other. So I finally decided to catch it while it was still in theaters.

It was better constructed than I expected. They had a plausible reason for the Autobots and Decepticons to be on Earth, and they were very good about following up on exposition. Every gun that appeared on the wall was eventually fired, down to Sam’s eBay auctions, with one exception: I really expected them to blow up Hoover Dam.

Which brings me to the biggest gap in logic. SPOILERS follow, for anyone who, like me, has been living in a cave. Continue reading

I recently discovered exactly how the Wayback Machine deals with changes to robots.txt.

First, some background. I have a weblog I’ve been running since 2002, switching from B2 to WordPress and changing the permalink structure twice (with appropriate HTTP redirects each time) as nicer structures became available. Unfortunately, some spiders kept hitting the old URLs over and over again, despite the fact that they forwarded with a 301 permanent redirect to the new locations. So, foolishly, I added the old links to robots.txt to get the spiders to stop.

Flash forward to earlier this week. I’ve made a post on Slashdot, which reminds me of a review I did of Might and Magic IX nearly four years ago. I head to my blog, pull up the post… and to my horror, discover that it’s missing half a sentence at the beginning of a paragraph and I don’t remember the sense of what I originally wrote!

My backups are too recent (ironic, that), so I hit the Wayback Machine. They only have the post going back to 2004, which is still missing the chunk of text. Then I remember that the link structure was different, so I try hitting the oldest archived copies of the main page, and I’m able to pull up the summary with a link to the original location. I click on it… and I see:

Excluded by robots.txt (or words to that effect).

Now this is a page that was not blocked at the time that ia_archiver spidered it, but that was later blocked. The Wayback machine retroactively blocked access to the page based on the robots.txt content. I searched through the documentation and couldn’t determine whether the data had actually been removed or just blocked, so I decided to alter my site’s robots.txt file, fire off a request for clarification, and see what happened.

As it turns out, several days later, they unblocked the file, and I was able to restore the missing text.

In summary, the Wayback Machine will block end-users from accessing anything that is in your current robots.txt file. If you remove the restriction from your robots.txt, it will re-enable access, but only if it had archived the page in the first place.

(Originally posted as a Slashdot comment. I reposted it here several years later, and have since backdated it to the original time.)

I picked up a couple of domain names for joke websites and spamtrapping on Tuesday. I set up a placeholder page for each, and I’ve started writing and designing one of them. Aside from running one of the test pages through the W3C Validator and hooking one page into Project Honeypot, no one outside of myself, Katie, and the domain registrar even knows the sites exist.

Of course, the domain registrar has to share that info with the DNS system at large, and this morning, both sites were hit by SurveyBot/2.3 (Whois Source). As near as I can tell, they just check the home page of every registered domain once a week to grab the title and see whether the site is active.

And just eight hours later, Ask Jeeves/Teoma showed up. I assume they got the info from Whois Source, or maybe they’re plugged directly into the DNS registrar system.

It’s just amazing that the robots have arrived first—even before the content!

CNET writes about a new model of the Roomba automatic vacuum cleaner and its application of technology iRobot originally developed for mine sweeping (real mines, not the game), touching briefly on the state of the consumer robotics field. Amazingly it includes the following sentence:

On the other end of the spectrum, the Roomba cleans up the living room and, in all likelihood, could not be used by a mad scientist to take over the Earth.

It sounds like someone’s been reading recent Sluggy Freelance comics!