Some people browse collections. I collect browsers. Mostly I just want to see what they’ll do to my web site, but I have a positively ridiculous number of web browsers installed on my Linux and Windows computers at work and at home, and I’ve installed a half-dozen extra browsers on our PowerBook.
One project I’ve worked on since my days at UCI was a script to identify a web browser. In theory this should be simple, since every browser sends its name along when it requests a page. In practice, it’s not, because there’s no standard way to describe that identity.
Actually, that’s not quite true. There is a standard (described in the specs for HTTP 1.0 and 1.1: RFC 1945 and RFC 2068), but for reasons I’ll get into later, it’s not adequate for more than the basics, and even those have been subverted. That standard says a browser (or, in the broader sense, a “user agent,” since search robots, downloaders, news readers, proxies, and other programs might access a site) should identify itself in the following format:
- Name/version more-details
Additional details often include the operating system or platform the browser is running on, and sometimes the language.
Now here are some examples of what browsers call themselves: (If your browser supports the title attribute, you can hover the mouse cursor over each line to see what program it represents. Edit Aug. 19: rearranged list for clarity and added a few more. Edit Sep. 11: added the IE/WinXP SP2 example.)
- Netscape Variations (non-Gecko)
- Mozilla/4.7 [en] (WinNT; U)
- Mozilla/4.72 [en] (X11; U; Linux 2.4.18 i686)
- Internet Explorer Variations
- Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)
- Mozilla/4.0 (compatible; MSIE 5.23; Mac_PowerPC)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
- Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; PPC; 240×320)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 1.0.3705)
- Opera Variations
- Opera/6.0 (Macintosh; PPC Mac OS X; U) [en]
- Opera/7.11 (Linux 2.4.20-18.9 i686; U) [en]
- Opera/7.50 (X11; Linux i686; U) [en]
- Mozilla/4.0 (Windows NT 5.0;US) Opera 3.60 [en]
- Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 4.0) Opera 5.11 [en]
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.50 [en]
- Gecko Browsers
- Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030524
- Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02
- Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.0.1) Gecko/20030306 Camino/0.7
- Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040612 Firefox/0.8
- KHTML-Based
- Mozilla/5.0 (compatible; Konqueror/3.1; Linux)
- Mozilla/5.0 (compatible; Konqueror/3.2; Linux) (KHTML, like Gecko)
- Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/125.2 (KHTML, like Gecko) Safari/125.8
- Others
- Lynx/2.8.4rel.1 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.6h
- IBrowse/2.2 (AmigaOS V45)
- Dillo/0.8.1
- NCSA Mosaic/3.0.0 (Windows x86)
- Mozilla/4.5 (compatible; OmniWeb/4.2.1-v435.9; Mac_PowerPC)
The first thing you’ll probably notice is that most of these claim to be Mozilla. This is a holdover from the early days of the Browser Wars. Netscape was frustrated with what could be done with HTML 2, and started building its own extensions. Web sites would check to see if the browser was Netscape (they used Mozilla as their code name) in order to decide whether to send the enhanced page or the plain one. Microsoft, hoping to get in on the Web action, wanted all of these sites to send the enhanced pages to their browser, so they identified it as Mozilla with a “compatible” note, and the real name in the comments.
So now you see things like Mozilla/4.0 (compatible; MSIE 5.22; Mac_PowerPC)
— which is not Netscape 4.0, as the basic version states, but is actually Internet Explorer 5.22.
Moving along, we find browsers like Opera, which used the same reasoning as IE, but a different format: Mozilla/4.0 (Windows NT 5.0;US) Opera 3.60 [en]
So for Internet Explorer, you have to look inside the parentheses, right after the word “compatible.” But for Opera, you had to look after the parentheses.
At the height of the Browser Wars, Netscape decided to release the source code for version 5 under an open-source license, allowing anyone to look at the code, modify their own copy, and suggest improvements or bug fixes. They called this code Mozilla, to distinguish it from the finished Netscape browser. This was problematic, though, because putting Mozilla at the beginning would simply look like another version of Netscape. If you had Mozilla version 0.1alpha, you didn’t want it to look like it was really Netscape 0.1, because then the server would assume you couldn’t handle things like JavaScript, frames, tables, plugins, etc. Netscape and Mozilla.org went through a number of different plans, finally settling on using Mozilla/5.0 to start, putting the “real” Mozilla version at the end of the parentheses, then putting detailed build information and any “official” browser names (like Netscape, Beonex, Camino, Firefox, etc.) after the parentheses. So you end up with an ID like Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02
for Netscape 7.
Meanwhile, Netscape was losing ground to Microsoft. Netscape 4 and IE 4 were comparable, and Microsoft was bringing the full weight of marketing to promote their free-as-in-beer alternative. More importantly, Microsoft was relying on a basic human trait to do their work for them: laziness. By tying IE to Windows and making it the default web browser, they virtually guaranteed that the vast majority of people buying a new PC would start using IE. (This is not conjecture, this is fact: Microsoft was convicted of abusing their near-monopoly on the desktop to gain a monopoly on the web.) And with Netscape 5 delayed by rebuilding their code from scratch, IE 5 easily surpassed Netscape 4 both in technology and market share.
So at the end of the decade, Internet Explorer had become the prevalent web browser. People stopped testing pages in anything but Internet Explorer. Some of them were amateurs who didn’t know better, some were professionals who were frustrated by Netscape 4’s limitations or limited by deadlines or funding, and it became easier to just discount the shrinking Netscape market. Smaller projects, such as Opera (now the #3 browser) realized they needed a way to get into sites that blocked non-IE browsers. As a result, the default identification for Opera 7.5 is Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Opera 7.50 [en]
— identifying itself simultaneously as Netscape 4, IE 6, and Opera 7. (Fortunately, Opera does allow you to set it to “do the right thing” and identify itself with the more sensible Opera/7.50 (Windows NT 5.0; U) [en]
.
But wait, it gets even better!
Enter Apple. Prudently realizing they might not want to rely on Microsoft for the default browser on the Macintosh, and also noticing that there were several high-quality rendering engines available to use as the basis of a new program, they settled on KHTML, the code used by KDE’s Konqueror browser. KHTML, like Mozilla, had standards compliance as one of its goals, so they decided to leverage all the post-Mozilla 1.0 articles which recommended checking for the phrase “Gecko” to determine whether you could rely on the browser being able to handle advanced features that IE ignores. So their beta called itself Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/60 (like Gecko) Safari/60
. People were appalled. Mozilla aficionados, already upset that Apple had chosen another program, complained that they were diluting the meaning of Gecko by using it on a browser that didn’t behave in exactly the same way. KDE fans complained that Konqueror would be further marginalized because people would start looking for “AppleWebKit” or “Safari.” They eventually compromised by changing the wording to “KHTML, like Gecko” with the KDE people adding “KHTML” to Konqueror’s ID. Strangely, even now Safari doesn’t use its actual version number. Going by what it reports, you’d think it was Safari 125.2.
So now there are at least four major places browsers put their real names:
- Name/version
- Mozilla/x (compatible; name/version)
- Mozilla/x (details; version)
- Mozilla/x (details) more details name/version
You’d think these would be enough, right? Wrong. I’ve seen browsers use all of the following:
- Mozilla/5.0 (X11; U; Linux i686; en-US; SkipStone 0.8.1) Gecko/20020417
- Mozilla/5.0 (X11; U; Galeon; en-US; 0.11.3)
- Mozilla/5.0 (X11; U; Linux i686; en-US; Galeon) Gecko/20010701
- Mozilla/4.61 [en] (X11; U; ) – BrowseX (2.0.0 Linux 2.4.9-31)
- Links (0.96; CYGWIN_NT-5.0 1.3.22(0.78/3/2) i686)
Does the real name go in the middle of the parentheses? At the end? Before? After? Is the version number put after a slash or a space? Is it put in the parentheses instead? Is it even there? What were these people smoking?
So you can see, just figuring out the real name and version number of a browser is far from easy.
Now, suppose you want to identify what operating system it’s running. (Insert maniacal laughter.) Let’s assume for the moment that you know where to look. Some of them are easy, like Windows NT 4.0
or PPC Mac OS X
or Linux i686
— or are they?
Somewhere around version 4, Netscape started identifying all Unix-like OSes as X11
(since the program ran under the X windowing system), with another field to identify SunOS, BSD, Linux, etc. So for those, you need to look in two places. (And then there’s trying to pick out Solaris versions from SunOS…)
Then there’s Windows. You can spot Windows NT
or Windows 98
but some of them are truly bizarre. Windows 2000 claims to be NT 5.0. Windows XP claims to be NT 5.1. And Windows Me first claims to be Windows 98, then adds an extra field identifying itself as Win 9x 4.90
— really!
If you want to figure out whether the browser is running on a Mac, it might be in the form Mac_PowerPC
, PPC Mac OS X
, or PPC Mac OS X Mach-O
(those are all from the same computer, by the way, using IE 5.2, Safari, and Camino).
If all you need to know is whether your audience is using Windows, Macs, or something Unix-like, your best bet is to just look for Win
, Mac
or X11
. Don’t look for PPC
or PowerPC
by itself, because you’ll catch people running Yellow Dog Linux or MorphOS. (Edit Aug. 19: Or, it seems, Windows CE. See comments for more info.)
And trying to guess capabilities based on the browser’s (supposed) name is just asking for trouble. You’re better off testing based on actual capabilities. Want to support browsers that don’t have JavaScript? Use a <noscript>
block. Want to send XHTML to browsers that can handle it and HTML to browsers that can’t? Check the Accept header. Worried about what what DHTML methods to use? Have your JavaScript check what’s available.
Unfortunately, trying to read the User-Agent name is like trying to read a map with no key. You need to know what you’re looking for.
(Whew — finally posted that! I started writing it a year ago, and found it in my Drafts folder today.)
Hello,
Actually, if you watch closely in Windows “About…” screen, on XP it says “Windows NT 5.1” and on Windows 2000 it says “Windows NT 5.0”, since the two OSes are based on NT code. Every Windows OSes are now based on NT code, like Windows 2003 Server (Windows NT 5.2) and future codenamed Longhorn OS (Windows NT 6.0). And Windows 98 Third Edition was subbed Windows Millenium, that’s why Windows Millenium version il Win9x 4.90.
I hope I clarified the situation 🙂
Thanks for the background. Anyone wondering why there’s a disconnect between the internal and marketing names, here’s your answer! (And thanks for the info about Windows Server 2003 — I haven’t worked with it yet, so I wasn’t aware of its “real” version number.)
One more thing: to clarify the point I made about Windows Me, it doesn’t just call itself
Win9x 4.90
(at least, not in IE). Here’s the complete UA string I collected from IE6 on Windows Me:Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)
It actually identifies the operating system twice — first as Windows 98, then as Win 9x 4.90. Thankfully, Mozilla-based browsers just pick one.
Interesting discovery in the server log:
Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; PPC; 240x320)
I don’t know whether this means there are WinCE devices based on the PowerPC architecture, or whether it’s an abbreviation for PocketPC and they’re just ignoring the previous use of the acronym. Something to look up…
One more thing: it looks like Opera on Windows Me actually identifies itself as being on Windows ME, even with UA spoofing enabled:
Opera/7.54 (Windows ME; U) [en]
Mozilla/4.0 (compatible; MSIE 6.0; Windows ME) Opera 7.54 [en]
This thing:
Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; PPC; 240x320)
is from a IPaq for sure. So they probably mean PocketPC by it.
Well, WinXP SP2 adds another item to IE:
SV1
, standing for “Security Version 1,” which appears right after the Windows version. The example given isMozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 1.0.3705)
.Supposedly this will be added as they backport the security features to other versions of Windows. I’m not convinced of the label’s usefulness, but one of the comments refers to some problems with Acrobat and the new security features, so it may be good for something other than malicious sites deciding which exploit to try.
just need to mention that i also have difficulties. created a stats site but the damn browsers make life hard! perhaps i better show which renderer one uses instead of the browser.
bah
Speaking as a web developer, I can say I’d definitely find stats-by-renderer useful! When it comes down to it, browser stats are useful for two things: tailoring the page to display correctly for as many viewers as possible, and targeting the audience. With so many browsers reusing code or components — various browsers that just embed Internet Explorer, all the browsers based on Gecko, the new crop of Mac browsers built on Webcore — the underlying engine is the most important for us.
Of course, people trying to gauge an audience will want to know other things, so it’s good to have the market share info as well. One thing I’ve found frustrating about Webalizer is that all the options for browser statistics are either too detailed or too loose. I want to know the Win/Mac/Linux breakdown, but I don’t want separate entries for IE 6.0 on Win2k, Win98, WinXP, etc., just one number for IE 6.0. AWStats is somewhat better, since it breaks it down several ways in the same report, but if I want to piece together the number or percentage of people using Netscape 4.x, I still have to add up a dozen rows of stats.
i’ll make sure my free version will search for renderers instead of browsers. i had enough UA’s that do display the renderer (mostly gecko) but not the browsername.
tricky stuff when not all parties follow an exactly defined standard.
I found one today that looks like Longhorn:
Mozilla/5.0 (compatible; MSIE 7.0; Windows NT 6.0; SV2; .NET CLR 2.0.1078
* It calls itself Mozilla/5.0, catching up to the competition. 🙂
* It does indeed consider itself to be NT 6.0.
* It is apparently Security Version 2.
* Apparently betas of IE 7 exist somewhere.
I also found that some installations of Opera will use the public Windows version (Windows XP) instead of the technical name (Windows NT 5.1). Whenever I’ve tested it, it’s followed the same pattern as IE. Either that or there’s a UA spoofer out there that isn’t quite accurate.