jmac's XML 2000 report

Jason McIntosh's XML 2000 report

This is my report of the XML 2000 Conference which Erik Ray and I attended from December 5th though the 7th. (It actually ran through the morning of the 8th, but we were both too burned out to bother by then.) It's mostly an expansion of notes I took on my palmtop during the couple dozen sessions I attended (less the ones that didn't interest me enough to encourage coherent note-taking), with other commentary thrown in as I think of it.

Since the sessions weren't very long (only 45 minutes each!), none went into a great amount of technical detail; more than anything, the con served to introduce me to a whole bunch of important XML-related topics which I'm probably destined to use in the coming years. Was sinking an entire week into the trip worth it when one could argue that I could get the same raw knowledge from reading online news articles for a few hours? I suppose the counter-question is: Very well, but would I? Whatever. In lieu of passing along any actually useful information, I'll just link to stuff where applicable.

Please alert me to any gaping conceptual errors in my reporting; much of this material was exposed to me for the first time during the con, and this report is as much me trying to make sense of all these jumbled factoids crammed into my head (and onto my Visor) as it is me sharing them with you.

General observations

Surprises

The nicest surprise was going into the wrong ballroom only to hear Tim Berners-Lee himself speak about his vision of the "semantic web", a structure abstracted by a step or two from the web as we know it, and based around automated linking of content by meaning. Ed Dumbill did a much more thorough job than me writing about this, but I'll share my notes later on anyway.

The anti-nicest surprise was not seeing Matt Sargent present his AxKit software anywhere. I could have sworn that I had read in some piece of pre-convention literature that he and his nifty (well, potentially nifty -- I have only started to play with it myself) server-side XSLT package would have a presence at the con, but once I got there, I saw no trace of either in the program. Maybe I dreamt it. I do that sometimes.

I was further surprised to encounter some paranoid reaction to Berners-Lee's presentation, thinking that he proposed to remove all human input from the Internet's information cycle. One guy got up and said "I see there is no room for hoo-mans in your [pause...] charts." (He actually did say 'hoo-mans', leading me to believe he may have been a Ferengi, but this is beside the point.) The next day, a presenter multiply asserted how Berners-Lee's vision lacked any trace of humanity, and an audience member had to stand up and correct his bad manners at the end of his talk. (Wasn't me, honest.)

Actually, there was a little bit of toe-stepping dynamic going on elsewhere in the con, with speakers within the same track acting as if they were correcting the fallacies put forth by someone who spoke before them. I found this rather unfortunate; it is possible to disagree in more subtle and elegant ways than this. Too bad.

Buzz

Among the buzziest topics were XML queries and topic maps, technologies which are in their infancy, or still gestating; XML querying, the proposed ability to select subsets of one's XML data in much the same way as one uses SQL to pull information from a database, is so young that it's only now getting its operator algebra lined up, and the core team hasn't really begun to focus on the language it'll use yet (though proposals from outside have appeared -- one of them, QUILT, had proponents among the speakers).

While this is really neat, it was topic maps that quickly got me and Erik running in excited little circles, mainly because nearly all the convention's speakers, right from the start, seemed to have at least something good to say about the idea, and many tried to tie in their own topics to the notion of topic maps. From the sounds of it, not only does this seem like technology worth documenting, but also one worth applying to our own digital publishing ventures, with Safari being the most obvious first target; imagine a layer of semantic links connecting every relating little nugget of knowledge in every book we publish! Well, I can't, because I don't know enough about topic maps yet. But I will learn! Oh yes.

If I had to choose just one buzzword to take home from the con, I'd pick "ontology". Everyone is so into building ontologies. And yet, I failed to contextually pick up what exactly people meant by the word, even though I witnessed several people ask speakers about it, to varying results. Such is the nature of buzzwords, I suppose. Well, we'll see what happens.

Session Summaries

Tuesday

The sessions started with a video eulogy for Yuri Rubinsky, an SGML visionary who died (young, of a heart attack) shortly before the XML spec started gearing up. I slumped in my seat, expecting something sappy and embarrassing, but it turned out to be slightly more interesting than that: mostly it was footage of some W3C people eating Chinese food and reminiscing about SGML's early days, and that community's reaction (not entirely positive) to the rise of HTML, a reaction that would become the roots of XML. So that was fun to see, if a little sad. (The woman who made the film gave a verbal epilogue to it, and got a little choked up in so doing.)

(Joke that nobody made, forcing me to think of it independently: Uri was so important to the birth of the Web that his name was immortalized as one of its core concepts.)

After that, a guy from the W3C gave a brief verbal outline of XML work as it stands today, going through a long list of organizations and working groups which are all laboring together to produce standards for all kinds of components, from graphics and secure ecommerce helpers to support for regular expressions and XLink, which, he said, "is an attempt to bring linking forward into the 1960s", and start to really exploit the powers of hypertext as seen by its original visionaries, decades ago. I was heartened to hear all this, being new to the XML scene and unsure of what standards exist. Very soon, the answer will be: oh, lots and lots.

The guy further taught me that one can legally pronounce ISO as 'EH-so', and that DSSSL can rhyme with 'thistle'. In hindsight, I find it somewhat disappointing that he didn't try to pronounce every abbreviation on his plate ('Welcome to the Zimmle 2000 conference!'), thus rendering them all as dictionary-legal acronyms and ending the acronym-vs-abbreviation debate once and for all. Oh well.

Then a big group of people all working on XTM, a topic maps standard, make the big announcement that version 1.0 of their proposal was, as of that day, available to the public, and encouraged the masses to crowd topicmaps.org and get a gander. Cool.

In sessions after lunch, I finally learned about XML Schema, a topic which I'd been hearing more and more about (through scanning xml.com headlines and even incoming CPAN Perl modules) and which was apparently the darling of last year's XML con. It has just recently attained W3C recommendation candidacy, and many other topics at the con built upon XML Schema as it now stands (as well as RDF, which some speakers, including Berners-Lee, always seemed to mention in the same breath with XML Schema), so I definitely got the impression that schemas are something I should learn, like, now. (In a nutshell (according to my notes), XML Schema provides an abstraction layer on top of DTD; you can define your XML syntax using another XML document, rather than in DTD's specialized syntax, and you can also use all sorts of data typing (both built-in and user-definable) and pattern matching that DTD by itself lacks, giving you much higher resolution document definition. Schemas also try to make it much easier to import other schemas and juggle namespaces, which seems joyously Perlish to me (at least on its surface).)

Finally, I watched the latter half of a demonstration of a Microsoft Word extension called Worx SE that makes Word act a little like Arbortext, displaying a tree of XML element labels down the a frame the to left of the actual content, and providing special commands and techniques to select and modify text by element, as well as the usual Wording ways. The program apparently lets one write horribly malformed XML if one wishes, but the tree diagram complains conspicuously at every misstep. The guy running the demo made it supa-snappy by using Worx itself to display and manipulate the presentation's accompanying text, instead of the otherwise ubiquitous PowerPoint. The audience chuckled appreciatively at his concluding remarks, where he briefly went over how utterly resistant Word is to sane extension in the direction of XML, and the pain the development team had to withstand in order to make it so. Erik and I both feel that this product is likely worth O'Reilly investigation, and he may have picked up some demo material.

Wednesday

I forgot what I actually intended to see during the morning session; the con had to swap its schedule around for some reason, but I ignored all the signs posted everywhere and went into the room I had already planned on, and up onto the stage bounced Mr. Berners-Lee, to my surprise -- I didn't even know he was at the conference! I may seem to gush about him in this report because his was the brightest point of the week, from my perspective: an energetic presentation delivered by an individual who appeared explosively enthusiastic about his work and stumbling over himself in his eagerness to share it. Really infectious. (To be fair, many of the other presenters were enthusiastic about about their topics, but most had a veneer of corporate-sponsored sobriety about them, and none could match this other guy's level of hyperactive hacker-charm.)

In the semantic web of Berners-Lee's imagining, information will be organized and interlinked by meaning, with the help of machine-processable semantic technology, a DWIM layer for the whole Internet, with enough metadata lying around to convert requests of type A into those of type B, if they're in entirely different locations or formats but semantically equivalent; he showed an illustration of some database tables stored across several locations, with postal code column names like 'zip_code', 'ZIPCD' and 'where', but still able to transparently combine their data from a single user request, due to the fact that, through meta-information, some intermediary mechanism knows that they all hold the same sort of thingy. Key phrase: "You stick it in the machine, and it does the right thing." As Perlish as you want, that, so I can only approve.

The semantic web will also address something called "monotonic logic" (or maybe, says Jason after poking around the Web, it was "non-monotonic logic"), the application of which I only sketchily caught; I think it has to do with setting up semantic rules for document delivery, though it may be broader than that. Berners-Lee gave the example of a site with a rule of "anyone who attended yesterday's meeting may view this content" stuck to it. When a user swung by and offered its credentials, the server would then use the resources at its disposal to determine if said user passes through the filter.

One of his slides had the word "Ontology" again followed by the acronyms "DL, OIL, SHOE". I think I started giggling, but nobody seemed to notice.

It does remind me that all the talk of machine-processable semantics during the con kept causing my mind to drift back to a Douglas Hofstadter book I read a couple of years ago that spent quite a bit of its content ruminating about the connections between semantic extraction and artificial intelligence, as well as the sketchy details I know about various AI projects that have been in the making for many years with the intent to get computer programs to wake up and start figuring out stuff for themselves. Yet, amidst all the talk of semantic models, I scarcely heard any references to any work of this flavor. The only one who dared to mention AI -- and he used it as if it were a dirty word -- was a grizzled LISP hacker who told Berners-Lee that everything he just heard sounded suspiciously like the promises he had heard 20 years prior from his former teachers about semantic networks, and he challenged him (to appreciative murmuring from the audience) to explain why the semantic web idea wasn't the same old pipe dream. TB-L responded that now is not then because, hey, look, we already have this World Wide Web thing to act as a suitable substrate, but unlike AI theories that focus on getting one computer to turn its eyes inward and start understanding, the Web is instead this incalculably large interlinking of computer-based locations with uncountable people associated with it, and thus able to take advantage of the decentralized logic systems that lie at the heart of his vision.

I thought it interesting how, in his diagram of all the s-web's layers, Berners-Lee had digital signatures crawling up the side. He spoke at length about his feelings that improved automated trust models -- ones that use sophisticated networks of logic, and not the brainless black-and-white solutions offered by VeriSign and its ilk -- will be key to making this all work, to the point where, he said, he'd happily retire as soon as he saw a logical trust system in place and working for the world.

Rushing back from lunch, I caught a panel discussion of O'Reilly authors and editors that ended up mainly focusing on favorite XML-friendly text editors, and concerns over conflicting standards; the most interesting (and reassuring) point I got from this was that ideas that appeared to be fighting for the same standard-space, such as XML Schema and RELAX (also a high-level document typer, based heavily on regular expressions), should best be thought of as separate programming languages, rather than another nightmarish browser-dependent scenario. One can use one method or the other, as appropriate, behind the scenes, but the content delivered in the end can be the same.

Over to the one panel on XML querying I attended, where I saw Phil Wadler, the same LISP hacker who played skeptic to Berners-Lee earlier that day, provide a nicely illustrated summary of the set-selection algebra he and the rest of his W3C working group have so far developed; it did in fact remind me of the logic that SQL uses, with the added functionality that the output adheres to an XML schema, and every piece of output comes in two parts: one with the actual return value, and another with that value's schema-compliant datatype. The whole thing seems quite straightforward, and Wadler noted that the working draft on the W3C website contains a tutorial. Nice! It made me really look forward to playing with the stuff myself.

Feeling suffused with practical knowledge, I at this point left the crowded technology and publishing tracks and meandered into the sparsely populated room of the "technology and society" track, attracted by the title of one afternoon session, "The Transparent Society", named after a recent David Brin book positing an information infrastructure where the battle for privacy is reversed so that everyone knows everything about everyone else (well, sort of), and the watched watch the watchers' watching with consistent clarity. Alas, I again became a victim of that day's juggled schedule, as I first sat through a talk by a woman from Xerox about privacy threats in general, which, having little to do with XML, felt rather off-topic, and certainly nothing I hadn't heard before, but maybe that's just from reading too much Slashdot.

After this, one David Newby, a library sciences researcher, gave me what I had come to see, which didn't meet my expectations (given the talk's title) but was nonetheless interesting in a thought-feeding sort of way. His point, basically, was that XML coupled with all the technologies and techniques discussed that week, particularly anything involving semantic markup, would make web-based information retrieval -- the thing you use search engines for, that is, searching for and pulling arbitrary information from content whose authors might not have necessarily intended for such searching -- suck less, and maybe even start to work reasonably well, as documents get better at describing themselves, leading to a wider sense of 'aboutness' and a resulting narrower (and more relevant) range of hits when one performs a websearch. This will be great for a whole range of people, with library sciences researchers on one extreme, people like you and me in the middle, and shady folk such as information brokers, who will gladly sell everything the world knows about you for a fee, sitting on the other extreme. Recommended reading: both "Transparent Society" and "Database Nation". Gotta do both, me. Newby felt that a Brin-flavored future is the more reasonable one, given that, by his predictions, keeping personal information private is only going to get harder. I think that before I read either book, I'll read the academic paper he wrote, included on the convention's CD-ROM.

Though I wanted to stay, I had promised Erik, locked into the publishing track, to take good notes about the afternoon's topic map talks, so back to the knowledge technologies track did I jog, where I listened to employees of empolis.com finally describe this idea to me, and then follow up with a lecture on XLinks. Holger Rath, topic map guru, really seemed to see TM as core to the web's evolution, going over a host of examples of web activities that could stand to gain benefit by adopting topic maps; I took particular note of his ideas for publishing enterprises, which can use the TM-buzzphrase of "Optimized Interactive Access" to give different views of the same data, depending upon the viewer's preferences; he likened a topic map used in this way to a transparent overlay tossed over your content, displaying links that are, in actuality, decoupled from the information itself, and easy to swap when one desires a different viewpoint. A person in the audience commented that his publishing company was experimenting with topic maps to transform the 'How to use this book' section of their online books into a recommended path of travel through the chapters depending upon what kind of reader one is.

XLinks isn't really all that new or buzzy compared to some of this other stuff, but it fit in well amidst all the topic maps frenzy. Empolis' Anthony Duhig spoke about the magic of out-of-line linking, where links live in a separate location from content (as opposed to living inside the content as a bunch of '<a href...>' tags or whatnot), and one's browser smooshes the two together to produce hypertext. Advantages of this method over traditional HTML markup include the fact that one needn't have write access to the content in order to create links from them (feel like writing smarmy annotation to the competition's e-books? Now's your chance!) and it can allow for better scalability with both maintenance and generation -- the former because redefining a single link in your centralized link document can update its appearance throughout your site (without yours having to paw through every HTML file to fix it), and the latter as one can use software to dynamically generate XLinks; Duhig briefly hawked just such a product that Empolis produces.

Thursday

The big morning presentation was Bill Burkett's "The Question of Semantics", a lengthy presentation which basically boiled down to the idea that only humans will ever be capable of generating semantics, and that's that. This was the one I mentioned earlier that, I felt, stepped a little too hard on Berners-Lee's Semantic Web speech, which set off my bogosity alarm and caused me to write 'blargh' on my palmtop. Keep in mind, though, that this was the con's last day, and after two full days of XML head-crammage, I was probably a little cranky.

During the Q&A, a young fellow said he and his team were attempting to turn some kind-or-other of an existing ontology into a topic map. He called it something like "Psy Core" at one point and "Lennat" at another. The speaker seemed impressed, but maybe he also had no idea what the kid was talking about and was faking it. I may have done the same thing, under pressure. Anyway, cursory web searches are showing me nothing, either way.

Then I attended a as-technical-as-you-can-get-in-45-minutes discussion of SOAP, which caused my brain to melt, but I did catch the parting comment that SOAP per se has ceased to evolve, and will leave the W3C's XP protocol to pick up where it left off. Surprising!

Fortunately, my head had a chance to cool during Microsoft guy Jim King's easy-on-the-eyes session about Digital Dashboard, a Microsoft do-it-yourself portal engine which allows users of an appropriately extended IIS web server to build their own 'dashboard' pages by picking and choosing 'web parts' from a catalog stored on the server. Each web part is actually an XML schema which, when instantiated, might become a stock ticker, or a news bulletin, or a product catalog. It's basically a slickly self-contained syndication application, with channels^H^H^H^H^H^H^H^H Knowledge Portals(tm) served up off a central source.

King's PowerPoint show ended with the only non-ironic, non-parodying use of the "Where do you want to go today?" slogan I saw that week. (Though he did mark it up a little, highlighting the word 'you'.)

Finally, I learned about XIL, an XPath-using markup one can stick into XSL documents to provide hints to indexing software (indexing in the sense of database optimizing, not book indexes) about which flavors of elements should be indexed with high priority, such as titles and section headers, and which contain noise not worth indexing at all, as might might do with filenames and metadata unrelated to the document's content. I didn't grok this until the very end, because the speaker was mumbly and all the code samples were in 3-point text, causing people around me to continuously whisper snidely at one another, making the guy's mumbling hard to hear.

Then I rode the train home, mostly.

Miscellany

The con totally lost as far as catering went. Perhaps the O'Reilly (Open Source|Perl) Conference has spoiled me, but at no time did I turn a corner to see a basket of cookies or a pyramid of brownies greet me. At exactly one point in the morning they served soda, and the rest of the time one had to make do with yukky, boiled coffee. To be fair, I've yet to have con coffee that was not yukky and boiled, and the hotel did have a built-in Starbucks which happily sold small cups for $2.50. (I think I bought three, all told. And one $1.50 donut. Sigh.) The lunches were OK, though.

Distributing the conference proceedings on CD-ROM is not convenient if the web pages contained thereon are unnavigable on any non-Microsoft operating system, because all the files' names are CORRUP~1.TED . There was less than 30 MB of material; I don't imagine that would have been hard to account for other types of filesystems. on the same disc. Conversely, I could just copy the whole thing onto my hard drive and run a Perl script over it to corrupt all the links so they match the filenames, but my heart's not in it.

Inconsequentia

I read most of "Dune" during the train ride to DC, having picked the paperback up after Thanksgiving (on National Buy Nothing Day) to stave off boredom while visiting my family; I hadn't even considered the miniseries, and once I saw ads for it on MBTA buses, I felt vaguely embarrassed about reading the book in public, but I did anyway. I will note here that I enjoyed the book immensely, and will write about it once my personal website stops blowing up. (I also rather liked the 1984 David Lynch movie version, but now see why all my Dune-fan friends despise it; it really is a nearly completely different story.)

Looking out the window as the train idled at the Rochelle, New York, I saw the following written on a sign, which had no other markup or other suggestions that it was an advertisement:

"Please, no running in the station. (Although we applaud your boundless energy and zest for living.)"

Notes taken during the ride home:

Are the people sitting kittycorner to Erik & I famous? I can't say. The old fellow sounds a lot like Walter Cronkite, and he and his two companions (a man and a woman, attractive and definitely camera-friendly people) are talking about the media, as well as projects of some sort they've all worked on. Later, their conversation, led by the old guy, wandered from their favorite tv show scripts to some cameramen just back from Vietnam the old guy once met to anxiety attacks to b2b commerce to the computer language LOGO. So, I really don't know.

They got off in New York and were replaced by two young, slick-looking businessmen, one of whom is very chatty with a NY accent. He loves poker, and has been sharing his poker-playing tips with his companion in truly cinematic dialogue punctuated by ruffling shuffles and chattering chips. I have to smile.

I didn't take very many pictures.

Updates

Jon OrwantR, upon reading this report, offered some clarification of a couple of points that confused me, so I quote him now:

It was definitely non-monotonic logic that Tim Berners-Lee was talking about for the semantic web. Basically, a system employs non-monotonic logic if it can assert something as true, but reserves the right to change its mind as new evidence arrives. Mathematical logic is monotonic, human reasoning is not.

"Psy Core" and "Lennat" was probably "CYCORP" and "Lenat", or maybe even "CYC or Lenat". Doug Lenat is an (in)?famous AI researcher who set to record All Of Human Common Sense into a huge ontology called CYC in a ten-year project...