What’s in a User-Agent String?

Posted on June 19, 2004

Some people browse collections. I collect browsers. Mostly I just want to see what they’ll do to my web site, but I have a positively ridiculous number of web browsers installed on my Linux and Windows computers at work and at home, and I’ve installed a half-dozen extra browsers on our PowerBook.

One project I’ve worked on since my days at UCI was a script to identify a web browser. In theory this should be simple, since every browser sends its name along when it requests a page. In practice, it’s not, because there’s no standard way to describe that identity.

Actually, that’s not quite true. There is a standard (described in the specs for HTTP 1.0 and 1.1: RFC 1945 and RFC 2068), but for reasons I’ll get into later, it’s not adequate for more than the basics, and even those have been subverted. That standard says a browser (or, in the broader sense, a “user agent,” since search robots, downloaders, news readers, proxies, and other programs might access a site) should identify itself in the following format:

Name/version more-details

Additional details often include the operating system or platform the browser is running on, and sometimes the language.

Now here are some examples of what browsers call themselves: Continue reading →

Silly Censors

Posted on March 18, 2004

Kelson

A few weeks ago I was looking at the website error logs and noticed some attempts to access images with names like /flash/images/%20%20%20%20%20%20%20ans3.jpg. I got around to looking at it today, and all of them are the same name, all of them from browsers looking at my profile of the Teen Titans, which includes an image called teentitans3.jpg.

I finally realized what’s going on. Some moronic filter has broken up the name not as “teen titans” but as “teen tit ans,” decided it must be porn, and replaced the “offending” words with spaces (%20 is the code for a space in a URL).

It really makes me wonder how badly mangled the page looks to these people, especially if it turns out that every instance of the team’s name gets pointlessly erased.

Further reading: The Censorware Project, Peacefire, Electronic Frontier Foundation.

Eeeww! (EvilML)

Posted on March 10, 2004

Kelson

I just caught a reference to Arve Bersvendsen’s EvilML file. What is it? It’s an HTML document designed to make use of the fact that HTML is, technically, SGML, which has all kinds of strange shortcuts you can use. Of course, no one has ever bothered to make a web browser that actually handles all these shortcuts.

It’s hard to describe it. The code is barely readable. The first line of text looks like this: <body<h1<em>Emphasized</> in <h1></>. No browser in existence is likely to display it correctly, and yet — amazingly enough — it validates…

I already thought that moving to the more rigidly-defined XHTML was a good idea, but suddenly it makes a lot more sense!

Linkrot, Part Deux

Posted on February 4, 2004

Kelson

While looking for more ideas related to my earlier post on fighting link rot, I came across some interesting articles:

Web Sites that Heal [archive.org] considers some of the causes of linkrot, including: changing CMS systems (which I’ve dealt with here twice), poor structure (starting small and simple, but finding that as the site grows, the old design doesn’t work anymore), lack of testing, and plain apathy. More interesting are some of the reasons it becomes a problem, in particular the difficulty in setting up redirections and informing other sites that you’ve moved. That’s something else I can relate to: My site hasn’t been on the UCI Arts server in four years, yet despite a massive attempt to get people to update their links, Altavista still shows 82 pages linking to my site’s old location. Something I think the article leaves out is the number of sites – particularly people who set up a free Geocities account back in the dot-com era – that just aren’t maintained anymore. The pages are there, but they’re six years out of date – and so are the links.

The article then proceeds to suggest an automated server-to-server system that will detect incoming links to a moved page, then contact the referring site, report the new location, and instruct it to update the link with no human intervention whatsoever. A great idea, though it will require people like me to drop the edit-locally-and-upload model of development.

“Web Sites That Heal” referred to a Jakob Nielsen column on Linkrot. Nielsen’s advice is frequently useful, though not always applicable [archive.org]. Sadly, his recent columns have tended toward rehashing old ones or applying to ever more specialized niches, but sometimes his advice is spot-on. In this case, the article from six years ago still applies to today’s web: run a link validator on your site from time to time, and keep old URLs on your own site active (whether with actual content or with a redirect). The comments on this article are worth reading as well.

Lastly, I found a remark on Consequences of Linkrot [archive.org] as applied to weblogs. Most of the post is actually an excerpt from Idle Words, where the original author notes that the classic blog post – a single line linking to something of interest, or a series of the same – is particularly susceptible to linkrot. Without the original material, there’s nothing (or next to nothing) left. And it happens fast: The Web isn’t that old, and blogging is even younger, yet information is disappearing rapidly enough that you really have to wonder how much of what exists today will still be around – in any form – ten years from now. One of the key lessons DeLong takes from this article: it’s “critically important not just to link but to quote–and to quote extensively.”

The lesson is clear: The site you link to today may not be there tomorrow, and you may not have the time (or inclination) to go chasing it down. Quote it, summarize it, add context, write lots of commentary, whatever. Make sure what you post can stand on its own… just in case it has to.

Weblog Etiquette vs. Link Rot

Posted on January 26, 2004

Kelson

On an ideal Web, pages would stay put and links would never change. Of course, anyone who has been on the Internet long enough knows just how far away this ideal is. Commercial sites go out of business, personal sites move from school to school to ISP to ISP, news articles get moved into archives or deleted, and so on.

There are two sides to fighting link rot. The first is to design your own site with URLs that make sense, that you won’t find yourself changing a few months or years down the road. If you have to move something, use a redirect code so that people and spiders will automatically reach the new location.

The other side to the fight is periodically checking all the links on your site to make sure they still go where you expect.

So how do you handle online journals? Obviously they’re websites, so from that standpoint you should at least try to keep the links current. But on the blogging side, there are problems with this, in particular the school of thought that you should never revise a blog entry (also discussed in Weblog Ethics). Continue reading →

Stop flashing me

Posted on May 12, 2003

Katie

As one of the many working stiffs who can access the internet from work but has to share a connection, I would like to make a request of the corporate world at large:

STOP REQUIRING FLASH TO VIEW YOUR SITE!!!!!!

Everything I look at on the net while at work has to go through a server in northern CA, which doesn’t have Flash capability and probably never will, because it would be even slower if the 250 people using it were allowed to view bandwidth-hogging all-Flash sites. With the economy being what it is, bandwidth costs being what they are, and connection power needing to be split at most offices, I’m not sure any company should be upping the ante this far in the name of pretty pictures. And the defense that people can look at it at home isn’t too great, either, since DSL is out of reach of more working stiffs than web geeks want to admit, and Deity-of-Your-Choice only knows when it might creep into affordability.

So, please do what you used to do, and keep your non-Flash site online after the upgrade, instead of routing us to a page exhorting the wonders of Flash and attempting to bully us into downloading it. (Baaaa.) You’ll widen your audience with very little effort–and hey, aren’t non-Flash sites easier to maintain?

K-Squared Ramblings

Sci-fi, comics, humor, photos…it's all fair game.

Category: Web Design

What’s in a User-Agent String?

Silly Censors

Eeeww! (EvilML)

Linkrot, Part Deux

Weblog Etiquette vs. Link Rot

Stop flashing me