When I started the category clean-up project a while back, I decided to start monitoring 404 errors on the blog to see if I missed any incoming links that needed to be redirected. I was surprised to find that the logs showed no 404 errors at all from within the blog structure. Images, sure, but no articles, no tags, no categories. This seemed a bit hard to believe.

I tested it by deliberately hitting a non-existent page, and was dismayed to find that Apache logged the hit as 200 (OK).

Crap! a WordPress update must have broken 404 handling! How long had this been going on? I’d better manually insert a header in the 404 page!

That seemed to work, as far as Chrome’s Developer Tools and curl -I were concerned. I didn’t have time to follow up on the logs right away, so I checked back later…and the logs still showed 200 OK, not 404.


It turned out that, when served through WordPress, Apache was sending a 404 code to the browser but logging a 200.

Probably a plugin, right?

Not so. I installed a fresh copy of WordPress on a test site and discovered something interesting: 404 codes were logged correctly when using the default /?p=123 permalink structure, but if I changed it to anything readable like /yyyy/title or even /title, the problem recurred.

A little more investigation: I skipped WordPress entirely and just hit a PHP page that served up a 404. When I hit it directly, it logged correctly. But when I used WordPress’ mod_rewrite rules to send a hit to that page, it logged a 200.

So clearly, it was something about mod_rewrite. I don’t run my own Apache server these days (my department at work is mainly a Windows shop), but I was pretty sure it didn’t work that way back when I did.

So I did some testing of different configurations at home and on my webhost. Direct hits always logged the correct status, but with a rewrite rule, here’s what I found:

FastCGI & CGI on DreamHost show 200/404.
mod_php on home box shows 404/404.
mod_php on DreamHost shows… 200/404.

At this point I figured there was no point setting up a CGI or FastCGI-based PHP environment on my home box, because it was clearly something about Dreamhost’s Apache configuration.

It does log correctly if you use ErrorDocument directive to point 404 to a PHP script. But IMO that’s abusing the error handler mechanism to do something it wasn’t meant for. (Not that I haven’t done it myself, but only on older IIS servers where ISAPI Rewrite and URL Rewrite weren’t available.)

I’ve added a custom logging snippet to my WordPress 404 page. There are other ways I can capture the data, but that seemed like the least overhead for now.

When a website redirects you to a new page, there’s always a slowdown. Even on a faster network, since each redirect starts a new connection, it never gets the chance to ramp up to full speed while you’re bouncing around from one intermediary to the next.

So why do so many websites redirect to index.cfm, index.asp, etc. instead of just changing the default in the site config so that www.example.com loads that page? I mean, it’s not terribly difficult, plus it makes your site easier to remember. Most importantly, it won’t break people’s bookmarks and links if you change the tech you’re using (from ColdFusion to ASP, or from ASP.Net to PHP, etc.)

Consider this scenario:

  1. Build site in PHP.
  2. Make home page redirect to http://www.example.com/index.php.
  3. Get lots of people to bookmark and link to http://www.example.com/index.php.
  4. Rebuild site in ColdFusion.
  5. Redirect home page to http://www.example.com/index.cfm.
  6. Watch all those old links and bookmarks break. Gee, I hope you have a good 404 page!
  7. Of course, you can fix it by adding another redirect….

Redirects have a lot of uses: keeping old links viable, sending downloads to the right mirror, correcting obvious typos, providing aliases that you expect people to guess…the list goes on. Even URL shorteners have their place. But this one is pretty much pointless.