KBlogger 6th entry: What standards?

Since I started writing this blog a month or so ago I've been beating the standards drum. However, I kind of put the cart in front of the horse a bit since I've been talking standards, standards, standards - and never really explaining what those standards are. Now, I'm sure many of you already know what I'm talking about, but I really think it is especially important to help out those getting started who may not know what the alphabet soup of web standards mean.

But before I get into that, there was a great article in The Register today. I just have to quote a bit of it here, but you should really read the whole thing:

Deri Jones, SciVisum's CEO warns that companies are in danger of damaging their brands by not addressing accessibility issues properly. "When webmasters design first for Internet Explorer and not standards-compliant browsers, they so often end up restricting user access to the website which has detrimental affects for a company," he said.

He went on to describe it as surprising that companies still fail to accommodate a variety of browsers, and warns that taking a non-standard approach limits a website's audience, and risks alienating some users.

The company recommends that web designers switch to using Cascading Style Sheets, and check pages using browsers other than Internet Explorer. It also suggests that anyone planning a redesign should consider using an open source content management system, such as Plone or Mambo.

Usability firm Human Factors International also expressed surprised that businesses still have not dealt with the issue of multiple browsers. Company managing director Jerome Nadel described the question of making a site accessible to multiple browsers as "a no brainer".

That's what I've been saying. Other than just doing the right thing, it is business. As IE's market share shrinks, sites that don't work well in other browsers are basically turning away customers. Can you afford to turn away customers?

So, the standards. You will see a lot of alphabet soup tossed around in web discussions - HTTP, SSL, TLS, HTML, XML, XHTML, CSS, DOM, DHTML, XSL, XSLT, WML. SVG, GIF, JPEG, PNG, URL... OK, that's enough, for now anyway. First a quick summary, then a little detail.

HTTP is the HyperText Transfer Protocol. This is the protocol used by the browser when communicating with the web server.
SSL is the Secure Sockets Layer protocol. This protocol was developed by Netscape in the early days of the web to protect web connections with encryption, to help kick-start ecommerce. When you see a URL with HTTPS it is using SSL. SSL was very popular and everyone adopted it. (Anyone else remember SHTTP? No? Didn't think so, SSL beat it in the market.)
TLS is Transport Layer Security. What's that? Well, you see, SSL isn't really a standard, just a de facto standard. Netscape created it and published it, and everyone did what they did. TLS is a standard designed to supplant SSL over time - you won't notice any difference in the browser, and the URLs stay the same.
HTML is HyperText Markup Language. I'm sure all of you knew that. It provides the basic structure of nearly all web documents. HTML 4.01 is the last revision of HTML.
XML is the eXtensible Markup Language. HTML and XML are both derived from SGML, the Standard Generalized Markup Language. HTML is a very rigid, pre-defined language. XML is more a meta-language, designed to provide a structure for authors to create their own elements. Today it is mainly used behind the scenes for data, and not often for delivering content to the end user.
XHTML is what happens when XML and HTML really love each other, or maybe just have a few too many drinks at the trade show. XML is a bit more strict about formatting than HTML, and XHTML is basically a transitional language between HTML and pure XML. XHTML 1.0 is HTML 4.01 revised to adhere to the XML standard. I would recommend that people use XHTML 1.0 today. I try to write 'XHTML' in these entires, but when I do so it also applies to the older HTML.
CSS is Cascading Style Sheets. This is a very powerful language that can be used on both XHTML and XML to control the presentation of the content. After XHTML, CSS is the most important thing to know for creating modern pages.
DOM is the Document Object Model. The DOM is a standardized way for browsers to interpret and address elements in a document. When the browser parses the XHTML or XML it builds a DOM - a model of the objects in the document - and then applies styles, etc, to that DOM.
DHTML is not a technology or a standard, it is just a marketing term. DHTML, Dynamic HTML, is simply XHTML + CSS + Scripting. The scripting (nearly universally JavaScript aka ECMAScript - see my earlier entry.) is used to manipulate the DOM to achieve dynamic results in the browser. Some DHTML can also be done with no scripting, with just XHTML + CSS.
XSL and XSLT are eXtensible Stylesheet Language and XSL Transformations, respectively. XSL is a style language that uses an XML structure. Today it mainly shows up as XSLT, which is a languages used to transform one type of content (say XML) to another (say XHTML or WML). This is pretty much always a server side language.
WML (PDF) is the Wireless Markup Language. This is used on many cell phones, it is the markup used for WAP browsers.
SVG is Scalable Vector Graphics. Think of it as a standardized version of Macromedia Flash (the current Flash Player has SVG support too). The idea is that Flash-like features are increasingly popular, and SVG will provide a standardized way to do these, with support right in the browser - instead of a plug-in. However, today, SVG really remains a possibility for the future.
GIF is the Graphics Interchange Format, developed many moons ago by CompuServe. I'm sure you all know it is one of the most common graphics formats on the web.
JPEG stands for the Joint Photographic Experts Group. This group developed a standard for digitally encoding images, which ended up being named after the group. When most people say JPEG they mean the format, not the group. And I'm sure you all know this is the other major graphics format on the web.
PNG is Portable Network Graphics. When CompuServe developed GIF they used an algorithm patented by Unisys. In the mid-90s someone at Unisys realized GIF was being used on this exploding web thing, and sent packs of lawyers out to try to collect licensing fees from software developers who had added support for GIF to their products. This caused a general outcry, and in response a lot of clever people decided to develop a better, unencumbered, format - and that format is PNG. It is widely supported by all the major browsers, though IE has issues with PNG's alpha transparency (unless you use the IE7 library I wrote about).
URL is Uniform Resource Locator. I know you knew that one.

OK, is your brain full yet? ;-) You think this is bad? Try networking - TCP, UDP, ICMP, ESP, AH, RIP, OSPF, BGP, ISIS, EIGRP, VLSM, CIDR, ISDN, BRI, B8ZS, AMI, RADIUS, TACACS... Geeks and acronyms, we're inseparable. With all that alphabet soup, there are really two standards I think we should focus on - XHTML and CSS.

XHTML you're probably familiar with, at least its HTML ancestor. Tim Berners-Lee created HTML in 1990 and there was never a real standard. HTML was enhanced and grown by different parties, with informal documentation available from a number of sources. However, by 1994 the World Wide Web was beginning to really take off and it was important that a standard be developed to codify the state of the art. This standard was HTML 2.0, published by the IETF. The IETF, or Internet Engineering Task Force, sets most of the standards for the protocols used on the net. Check out HTML 2.0, it'll give you an idea of how far we've come.

After HTML 2.0 a number of proposals for enhancements were floated - form-based file uploads, HTML tables (that's right, HTML 2.0 didn't have tables yet), client-side image maps (Anyone else ever actually use server-side image maps and ISMAP?), and others. All of these fed into the effort to create HTML 3.0.

HTML 3.0 was overly ambitious, to put it mildly. With the explosive growth of the web people wanted it to be everything to everyone. The proposals for HTML 3.0 included concepts to be able to mark a range of content in a document, cryptographic checksums for included content (like images), figures, mathematical markup (symbols, equations, etc - these concepts returned, in a better form, as MathML), new form controls (sliders, knobs, Scribble On Image)... OK, I need to jump in there. I have long held Scribble On Image up as the example of the bloat and over-reach that doomed HTML 3. This is from the draft spec:

Scribble on Image --(type=scribble)--

These fields allow the user to scribble with a pointing device (such as a mouse or pen) on top of a predefined image. The image is specified as a URI with the SRC attribute. If the user agent can't display images, or can't provide a means for users to scribble on the image, then the field should be treated as a text field. The VALUE attribute can be used to initialize the text field for these users. It is ignored when the user agent provides scribble on image support.

Keep in mind this was 1995! They wanted people to be able to input arbitrary doodles via HTML forms! I just think this is ridiculous. I'm not sure I can think of any reasonable use for this now, other than for games or something. Anyway, HTML 3.0 was also loaded down with tons of presentational markup, etc - which eventually showed up in CSS instead.

During the HTML 3.0 effort the IETF decided they really weren't the proper body to handle a document format standard. The IETF deals mainly in protocols for moving data around, not in the data formats. So the W3C was founded to take over stewardship of the web standards. Since HTML 3.0 was dragging on, both Netscape and Microsoft had run off and implemented pieces of the 3.0 proposal, as well as completely new things like Frames, in their browsers. The web was diverging, and the HTML 3.0 standards work was becoming increasingly meaningless. Even if a standard were developed, the browser vendors had already 'voted with their feet' and gone off in another direction.

To try to prevent the web from dissolving into multiple, incompatible camps the W3C decided to scrap HTML 3.0, and instead develop a new standard which would codify the then state of the art, and act as a stepping stone to a later recommendation for enhancements. This new standard was HTML 3.2, published in early 1997. You can see where HTML 3.2 is an evolution of HTML 2.0. It wasn't a radical change, but it represented the common ground between the different browser vendors. It was designed to help herd the cats in the same direction.

And that direction was HTML 4 - OK, OK, that link is actually to HTML 4.01, which is a minor revision and the final, and current, version of HTML. HTML 4 originally appeared in late 1997, and the 4.01 revision at the end of 1999. If you look at the changes from 3.2 to 4.0, and then changes from 4.0 to 4.01 you can see that this was the big leap for HTML. Internationalization, frames, object, table changes, form changes, and more. HTML 4.01 is basically the foundation for the web as we know it today. So things have been fairly stable on that front for over five years.

That's not to say there haven't been changes. After XML was developed, the W3C decided that the future direction for HTML would be XML-based. In early 2000 XHTML 1.0 was published, with later revisions in late 2002. As the title of the recommendation itself says, it is "A Reformulation of HTML 4 in XML 1.0". There are only minor differences with HTML 4.01. However, most of these changes are just good ideas anyway. In HTML you could leave some elements unclosed, for example opening a paragraph with <p>, but never using </p>. In XML all elements must be closed, so you need to use </p> on the end of your paragraphs. But this is just cleaner markup anyway, and if you want to use CSS or do any DHTML you really need to do this or you will get odd behavior on different platforms. This is no big deal because you can make a document that is XHTML 1.0 compliant and HTML 4.01 compliant at the same time, simply by following these HTML Compatibility Guidelines.

And that's where we stand today. There is an XHTML 1.1 and a XHTML 2.0 Draft, but these really don't have client support at this time. So you're better off sticking with XHTML 1.0 for now. XHTML 1.1 is really about breaking XHTML 1.0 into modules, to allow pieces to be used independently to produce new recommendations such as XHTML Basic for mobile devices, or to be combined with other recommendations to allow for more capabilities to be added in the future. An example of this is the XHTML + MathML + SVG Profile. So use XHTML 1.0 on your new pages.

Right, Well, I guess that's it for... What? Oh, right, CSS! One of the things people have wanted from the web is control over the appearance of the documents. Early on elements and attributes were added to HTML to provide some of this control, and the doomed HTML 3.0 recommendation would've added many more. However, embedding these controls into the document is severely limiting. It also mingles structure and presentation, which causes problems on many levels. XHTML is derived from SGML, and the SGML world handled this with DSSSL - Document Style Semantics and Specification Language. But DSSSL is a fairly hefty language, and quite a bit of overkill for the web.

Instead of adapting DSSSL (it was considered), a more declarative system was developed, this system became CSS - Cascading Style Sheets. CSS1 was first published in late 1996, and it gave web developers previously unheard of control over the appearance of their sites. CSS2 was published in early 1998 and added many features. Currently CSS2.1 is pending as the next official revision. It updates CSS2 a bit, and paves the way for CSS3, which is under development.

Most of the current browsers have solid support for CSS2.1 at this point - of course, IE has the weakest support for CSS2/2.1. (So use the IE7 library already!) Some support for CSS3 is already appearing in the latest browsers, it promises quite a bit more power. CSS is absolutely vital to producing good looking pages that really take advantage of today's browsers. Hey, just be glad that JavaScript Style Sheets, aka JSSS, never caught on.

OK, OK, I'm done. I bet you're glad to know that. Oh - use PNG more, it really is a nice format. Shame it doesn't get a lot of use. If you have any questions, comments, criticisms, or anything else to say, just leave me a comment. Let me know if there is something you'd like covered and I'll see what I can do. Until next time...