mark nottingham

Are Namespaces (and mU) Necessary?

Friday, 7 April 2006

HTTP APIs Web Services XML

It’s become axiomatic in some circles — especially in WS-* land, as well as in many other uses of XML — that the preferred (or only) means of offering extensibility is through URI-based namespaces, along with a flag to tell consumers when an extension needs to be understood (a.k.a. mustUnderstand).

The reasoning is that extensibility should be as easy as possible. By leveraging one registry — DNS — you can use URIs to allow anyone to create your own uniquely identified vocabulary, without any overhead of co-ordination.

This is often contrasted (and deemed superior) to the approach of the IETF, which uses IANA to manage many a namespace, requiring prospective registrants to jump through a variety of hoops to get in.

I didn’t question the conventional Web wisdom for a long time. Two things make me reconsider it today; first of all, this message from Roy Fielding to the atom-syntax mailing list last year, responding to my request for a mustUnderstand flag in Atom;

One problem is that the “must understand” feature is intended to prevent dumb software from performing an action when it doesn’t know how to do it right. In reality, software tends to have bugs and developers tend to be optimistic, and thus there is no way to guarantee the software is going to do it right even if it claims to know how. In the end, we just waste a bunch of cycles on unimplemented features and failed requests.

Another problem is that the features that benefit from a must-understand bit tend to be socially reprehensible (and thus the only way they could be deployed is via artificial demand). As soon as one of those features get deployed, the hackers come out and turn off the “must understand” bit for that feature, defeating the protocol in favor of their own view of what is right on the Internet…

In fact, “must understand” has no value in a network-based application except as positive guidance for intermediaries, which is something that can still be accomplished under mustIgnore with a bit of old-fashioned advocacy.

The second is repeated exposure to the Microformats folks, especially Tantek’s arguments that social co-ordination brings the value in shared data and protocols, not infinite extensibility, which tends to encourage duplication of effort.

Applying this to protocols is interesting. Henrik Frystyk Nielsen has maintained that some of HTTP’s biggest flaws are its lack of namespaces and mandatory extensions. He tried to introduce them in PEP, and when that didn’t get traction he took the idea to SOAP. Don Box picked this thread up recently;

On the topic of extensibility mechanisms and the hell they inherently allow, it’s fun to imagine the world that might have been had Paul Leach and Henrik Frystyk Nielsen been successful in getting HTTP Extension Framework adopted… we might see GET requests that look something like this:

M-GET / HTTP/1.1 Host: some.host Opt: “http://www.xmlsoap.org/ws/reliablemessage”; ns=14 Man: “http://www.xmlsoap.org/ws/security”; ns=15 Man: “http://www.xmlsoap.org/ws/secconv”; ns=16 Man: “http://www.xmlsoap.org/ws/trust”; ns=17 Opt: “http://www.xmlsoap.org/ws/timestamp”; ns=18 14-SequenceId=”uuid:12341234-1234-1234-2134-123412341234” 15-Token: “abc9ea…” 15-Signature: “hash=eaffab36ca…, referencedParts=…,” 16-Token: “aa84…” 18-Expires: “2006-12-01T08:00:00Z”

Had we arrived here instead, would we now be referring to HTTP as the Slip-n-slide to Hades?

The answer, obviously, is “yes.” Roy’s thoughts about forced extensibility being “socially reprehensible” ring very true here; it’s not the syntax or the technology that are bad about the message above, its that Microsoft (in this case) is allowed to unilaterally introduce new extensions without engaging the rest of the community in good faith.

Public Good

In this view, making the points of extensibility into scarce, community-managed resources — e.g., as media types do — is a good thing. It has positive political and social effects; it forces (or at least inclines) people to co-operate, whether they’re a multi-billion dollar behemoth, or a sole engineer who wants his fifteen minutes of fame.

Namespaces aren’t completely evil, of course. If you want to explicitly allow anybody to walk up and add data to your format, they’re a fine way to make sure there’s no ambiguity, and give nice leverage for versioning, and perhaps for separating different concerns. I think this will tend to make sense for formats where truly disconnected, uncoordinated data is collected, like RDF.

However, they don’t automatically make sense for situations where you need tight co-ordination between different entities (e.g., things we tend to call “protocols”); allowing anybody to rock up and extend a protocol with no overhead is inviting interoperability problems and abuse.

I have a harder time coming up with any valid use cases for mustUnderstand. Requiring that unrecognised extensions be ignored (mustIgnore), combined with different identified languages (using namespaces if you must, or better yet, media types), should be enough.


13 Comments

David Megginson said:

I’d split your argument – I agree that mustUnderstand is probably a failed experiment, but I’d argue that wide-open extensibility coupled with mustIgnore is a public good. Would the web had evolved as fast as it did if HTML hadn’t been easy to extend without going through a formal review process? Would Atom’s designers have known where the sweet spots were if it hadn’t been so easy to extend RSS 2.0 with Namespaces?

Friday, April 7 2006 at 2:47 AM

Stefan Eissing said:

During the last years we worked on (among other things) extensions of the WebDAV protocol and the mustIgnore together with xml namespaces made that quite a good experience.

Of course we eventually tried to fold every useful extension back into an rfc, had to change the namespace etc. Some things did not make it back, but were still useful and we so we had proprietary exensions as well.

I think when the pressure on a spec is too high, it’s better to have some defined ways to vent the steam to keep the main engine unharmed. But that does not mean you poke holes into it…

Friday, April 7 2006 at 3:25 AM

M. David Peterson said:

Mark,

Yep. Now that you point this out it seems pretty obvious that those in whom attempt to extend the core to their own advantage, without first engaging the other core members competing in this space, will find themselves dealing with a really big mess to clean up, and very few people willing to help them clean up the mess they just made.

Yaron, It seems to me that very few changes to any given core will need to take place once enough time has passed for the same mentioned core to mature. Of course, as time moves on there will always be advances in technology that warrant/justify extending the core to allow a more natural “look-and-feel” interface for the rest of the community to tap into. But the core itself rarely needs overhauling, and as Mark points out, when folks go there own direction without first engaging the other involved in this space, it rarely will result in anything that can be seen as good for them, much less the community at large (although they’re/we’re obviously not gaining anything from it either.) Of course, once you get past the core itself, and the protocols in which interface with the core, then I believe you are correct… But at this stage it seems we’ve past the protocol stack and reached the edge of the application layer in which I guess could be termed the interface into the protocol stacked? Not sure, need to think that one through a bit more, or hear what others have to say on the matter to gain a better feel for this part.

Saturday, April 8 2006 at 1:04 AM

M. David Peterson said:

It seems to me that you have all the right pieces in place and all the right arguments to justify your point.

And as you suggested, it makes sense for the notion of the RDF/Semantic Web where the non-coordination of efforts is the exact reason/purpose for the specification in question.

The core of any given system, or the kernel applying this to specific computing terms, doesn’t require that anyone and everyone on the planet be enabled to easily extend its functionality for the system as a whole to survive. So namespaces don’t really offer any advantage, and as you pointed out, allowing extensibility to the core of any given system doesn’t do the public a lick of any good.

On the other hand, the public needs the ability to express themselves, connect (or disconnect) themselves with one another, etc… In cases such as this, the ability to differentiate ourselves with one another, yet still be able to inter-operate as a community, namespaces then become an obvious necessity.

The laws and rules that apply to the system core (in this example, the laws and rules of the associated goverment) still apply to each individual, and in fact its because of these laws and rules that the community is able to function as a community in the first place. If each individual could simply change the laws and rules to fit their own needs and desires by using extensions that suggest “in my case, these other rules don’t apply”, obviously the system would immediatelly begin to break down as soon as the first contradictory extension was implemented.

With all of this in mind it seems to me that all of your conclusions are spot on.

Saturday, April 8 2006 at 4:57 AM

Yaron said:

I think that Must Understand tends to not work well because, as Roy pointed out, all you can really do is fail. So using mU a lot just means you are making lots of things fail. I am biased of course toward must ignore functionality but that’s probably because I was the one who insisted on shoving it into WebDAV. :) With must ignore functionality you have a more ‘friendly’ attitude toward extensibility. You send out messages that you more or less expect folks to understand with some bits they might not and therefore will ignore. You can, of course, implement mU with must ignore by changing the root element to something that isn’t recognized. I think there’s probably an important point hiding in there some place. You can create mU with must ignore but not the other way around.

I do however think that requiring everyone to go through IANA every time they want to add functionality is a really bad idea. Requiring IANA for everything would both overload IANA and slow down innovation. Innovation is often ugly but that’s o.k. so long as we can keep new innovative features from disrupting existing systems. mU isn’t evil but handled badly it can trivially become disruptive. Must Ignore is far from perfect (if only because it can be trivially turned into mU) but it seems to encourage more socially positive behavior.

Saturday, April 8 2006 at 12:46 PM

David Megginson said:

Thanks for your followup, Mark (scroll way up, everyone else). The point of my examples was that extending HTML or RSS did not require access to a “scarce, community-managed resource.”

Namespaces are an attempt to keep people’s extensions from stepping on each other – sometimes Namespaces work well (RSS 2.0 and XSLT are good examples), and other times they’re probably overkill – but I would argue strongly against any extensibility mechanism that was more extreme than Namespaces, including a central approval/registry mechanism.

Sunday, April 9 2006 at 3:47 AM

David Powell said:

I’m not entirely comfortable with this notion of “The Community”. Does it scale downwards? Don’t successful technologies start life as experimental technologies, unready for standardisation? Look at MIME media types:

Originally there was a standards tree, and an experimental tree. The standards tree is the “scarce community managed resource”, and the experimental tree makes an allowance for small-scale development.

The experimental tree didn’t work, because many experiments are successful and evolve into standards, and stripping off the x- prefix, for vanity, once the MIME type is successful isn’t in the interests of interoperability.

MIME introduced the vendor and personal trees, which avoid this black-and-white distinction between experimental and standardised. This seems to be a better arrangement, but aren’t these trees more like mU+namespaces than “a scarce community managed resource”?

Forcing large vendors to think about internet-scale interoperability via difficult registration processes might be a good thing, but it is unreasonable to make the same requirements of small behind-the-firewall applications.

Sunday, April 9 2006 at 7:30 AM

Yaron said:

I think HTML versus HTTP is an interesting comparison. Lots of folks add lots of things to HTTP for various reasons from intermediary support to experimental features (e.g. cookies, range headers, etc.). Some make it to the big time most don’t. But the whole thing works because if you happen to run into a message with a header you don’t grok you can ignore it.

The same is superficially true with HTML. If you see a tag you don’t understand you can ignore it. But unless the page designer was very good it’s quite likely that the functionality you end up with when you ignore key tags is likely to be degraded to an unacceptable level. Yes, it is possible using script libraries and other uglyness to work around this but in practice nobody but the really big sites have the budget to deal with this nonsense. So in a very real sense HTML isn’t extensible in the same way as HTTP is because it’s goal is different.

Most HTTP requests have a very limited goal - pull down some data (I assume we aren’t going to argue that 99.99% of all HTTP requests are GET?) which you can then decorate without doing too much damage. But HTML is about publishing human readable information which is a much broader goal and therefore doesn’t degrade nearly as well.

BTW, when I say degrade in the case of HTML I recognize that you can always view source and get the ‘message’ but for most sites that level of degredation is a business killer and so not acceptable.

Sunday, April 9 2006 at 12:20 PM

Stefan Eissing said:

Yaron, i think the difference you described between HTTP and HTML lies in the head of the person using it. A programmer decorating his GETs with extra headers expects to, eventually at least, talk to servers who ignore the headers.

A web designer often is quite unaware (or ignorant) of any mU or mI issues and uses every feature “available” to him. In web design you often saw the “seems to understand” approach.

Now, there are bad programmers and excellent web designers. The difference is that everyone writes HTML now and then, but only few ever use HTTP on the header level.

So, my point is: the difference between HTML and HTTP comes from the usage scenarios. I don’t think any mU/mI/namespace/registration gizmo would have helped HTML during the wild 90s.

Monday, April 10 2006 at 12:53 PM

M. David Peterson said:

Hey Mark,

I had this conversation bookmarked hoping to come back to it before now, but trying to stay caught up with projects has kept me from checking back on this thread before now.

re: – WRT behind-the-firewall, anything goes; most companies I’ve been at will have FooCorp-* HTTP headers, application/fooCorp-* media types, and so forth. Works a charm. –

I agree. Works equally as well when developing in Java, C# and other .NET languages, as well as other platforms/languages that integrate namespace/classpath type naming conventions to allow for extensibility. Implementing a central registry for namespace usage in this regard would damage the industry, while providing no justifiable benefits in return.

The same is true in the webfeed arena. For example, if you take a look at the following atom file [http://dev.extensibleforge.net/browser/trunk/ChannelXML/src/base.atom]

you’ll notice the usage of namespaces that are a part of the AtomicXML code based I’ve been working on (note: that particular feed is only half way converted to the new element usage for atomicxml – the module/@id = ‘footer’ is somewhat current, the module/@id = ‘content’ is foobar, and the module/@id = ‘header’ is about half and half.) Once complete, this codebase will allow the ability to use modularized snippets of XML to define any particular type of reusable module, that is then woven together with the matching data set to create the final XHTML/CSS output (SVG/CSS, XAML, and XUL/SVG/CSS will be supported as well.) There will also be support for XHTML modules.

The desired result is that anyone will have the ability to define any type of module (a menu, complex mixture of content and markup, etc…) that can be labeled an assett, have a mapping file associated with it that tells the weaving engine how to match each element, or blocks of elements with the data, to then specify the desired output format. With this in mind, the ability to keep track of the original source of any given module (especially as it relates to copyright issues) while at the same time associate the proper mapping file for the data as well as avoid collisions during the transformations process is of significant importance.

But once you get past this layer where interop is the primary focus, then the benefits of namespaces becomes fuzzy, at best!

To me it comes down the importance of interop to any given implementation. If interop is at the core, then namespaces become mandatory. If not, then there doesn’t seem to be any real benefit to using them.

Stefan,

In web design you often saw the “seems to understand” approach.

I like that! Very well stated :)

Saturday, April 22 2006 at 2:49 AM