Overview of XML-related standards
Steven J. DeRose, Ph.D.
Brown University
Scholarly Technology Group

Steven_DeRose@brown.edu
http://www.stg.brown.edu/~sjd

XML and related specs
XML: The basic syntax
Plus Namespaces, Schemas, InfoSet
DOM: API to the Information Set
XML Linking
XPath: Expressions to find XML nodes
XPointer: XPath++ for addressing
XLink: hypermedia connections
Stylesheet Attachment
XSL: stylesheets and transforms

XML specification
A “Recommendation” since 2/1998
The highest level for a W3C specification
Defines the syntax/grammar
Not any particular processing/semantics
Schemas or DTDs define applications (poem, manual, eCommerce,...)
All these can be parsed by generic XML, just as new words can be readily fitted into existing sentence structures
Schemas are political as well as technical

XML Namespaces
Disambiguate element type names
<head><html:title>Oncataloging</html:title>…
<biblio><entry id='DeRo98'>
   <loc:title>Navigation, Access, Control…
Declaring prefixes
<sec xmlns:loc="http://foo.com/mynamesp”
xmlns:html='http://www.w3.org/1999/xhtml' 
xmlns="http://…"> <loc:title>…
Declaration without prefix sets default
Attributes can have namespaces
No renaming (x:foo to y:bar)

XML Schemas
Let you define a document type
What elements/attributes are defined?
Where can they occur?
What content is allowed?
What datatypes are represented?
Required for validation
Similar to DTDs, but
More powerful (esp. for datatyping)
Use XML syntax

XML Information Set
What data in XML document “counts”?
Elements, attributes, content
Order and hierarchy of nodes
Required for interoperability
Applications must count nodes consistently
Not whitespace inside tags
Not which kind of quotes around attributes
Candidate recommendation 2001-05-14
http://www.w3.org/TR/xml-infoset

7 types of Infoset Nodes
Root: Above the document

<?foo ?>  <doc>…</doc>   <!-- hi -->
Element: Main structure
<div n='1'>…</div>
Text: Spans of unbroken text
Attribute: Properties of elements
Namespace: Prefixes/URIs
Processing Instr: <?…?>
Comment: <!-- … -->

Example

More Infoset details

DOM
"Document Object Model"
An API for accessing the Infoset
Many tools use this
Level 1 complete
http://www.w3.org/TR/REC-DOM-Level-1
 Level 2 core complete
http://www.w3.org/TR/DOM-Level-2-Core

XML Base
Similar to the HTML <base> element
Useful for keeping URIs simpler and uniform.
Applies to relative URLs
<html>
<head>
<base href="http://www.example.com/">
…</head>
<body>… <a href="fig/mosquito.png">
The hrefs combine to make whole URI:
http://www.example.com/fig/mosquito.png

XML Base
XML Base provides similar feature
By a reserved attribute
<?xml version="1.0"?>
<doc xml:base="http://eg.org/today/">

See <link xlink:type="simple" xlink:href="new.xml">the news</link>
Applies to attributes & descendants
Can be overridden on descendants
Final REC as of 2001-06-27
http://www.w3.org/TR/xmlbase/

Stylesheet attachment
Lets documents point to stylesheets
Based on HTML <link type='stylesheet'>
Multiple, anywhere in XML prolog
May point to CSS, XSL, etc.
Example:
<?xml-stylesheet alternate="yes" href= "mystyle.css" title="Medium" type="text/css"?>
Equivalent of HTML:
<LINK href="mystyle.css" title="Medium" rel="alternate stylesheet" type="text/css">
REC: http://www.w3.org/TR/xml-stylesheet

XSL specification
Stylesheet language
Based on ISO DSSSL and W3C CSS
2 major pieces:
XSLT: document transformation
Builds on XPath (more later)
Match elements, then construct output
XSL-FO: Formatting objects
To actually render blocks, fonts, tables, etc.
Hypermedia support unfinished (=CSS)
http://www.w3.org/TR/xsl/

Current XML organization
XML Plenary coordinates
      several WGs

XML-Linking specifications
XPath: expressions on infoset nodes
REC: http://www.w3.org/TR/xpath
XPointer: XPath + ranges, in URIs
CR: http://www.w3.org/TR/WD-xptr
XLink: gather locations to make links
REC: http://www.w3.org/TR/xlink/
(XML Base)

XML-Linking goals: end user
Links from un-writable documents
Which is most of the Web, for any person
Perhaps the most important single feature
->Bidirectional and multi-ended links
->Annotations and annotation sharing
Dynamic updates, patches, highlighting
Precise link attachment in any media
Large sets/databases of managed links
An entirely new market for links per se
Anyone can publish/sell their commentary

Pointing vs. linking
In HTML, many things are combined:
<a href="eg.org/foo">wow</a>
Technically:
"eg.org/foo" is a pointer (namely a URI)
The abstract connection itself is the link
The <a> element is a link representation
"wow" is the local anchor
Anchors are also called link-ends
Data at eg.org is the remote anchor
HTML specifies the link behavior

XPointer: locators

XLink: connections
Describes a relationship
of referenced location(s),
To each other
To descriptions
XLink provides
some key ones

XPointer…
Locates parts of XML resources
Even things without IDs
Even things that aren't whole nodes
XPointer adds (beyond XPath):
Way to refer to point and range selections
Way to use inside URI fragment identifiers
TEI “extended pointer” notation plus XPath logical expressions
Typically, a browser might load a document and scroll to/highlight the part

Anatomy of a URI reference

Fragment identifiers
Part of URIs after "#"
Says where in document is actual target
Separate form for each media type
Identifiers for graphics ¹ for text
IETF MIME definition specifies form
HTML
To scroll to <a name="coyote">
http://example.com/hello.html#coyote

The 3 XPointer/XPath forms
Bare names
An XML "name"* finds element with that ID
For (X)HTML compatibility
HTML uses "NAME", not ID
Child sequences
Stepwise down through elements: /1/4/27/2
May start with an ID: intro/4/3/2
Full XPointers
scheme1(args) scheme2(args)…
For now, the only "scheme" is "xpointer"

XPointer's 2 parts
Provide 'scheme' mechanism
Identify media-specific pointer types
Allow multiple ones to co-exist
Pointing methods for XML
Point to ranges, sets, id's, coords…
Point descriptively

XPointer schemes
Each media type needs pointer type
pngRect(0,10 100,200)
vrml(camera=1,2,3 light=4,50,500)
map(W0°10’/ N51°30’)
Xml(…)
Schemes label fragment identifier types
#scheme1(args) scheme2(args)…
Escape any extra ( ) -- tlg('^(apax')
XPointer() is the first scheme

Multiple schemes in a URL?
When a server responds to a URI, it
Checks what media the client can handle
Picks one of those to send
“content negotiation”
If a visually-impaired user clicks
<a href="http://www.example.com/foo.gif# gif(0,0 1,1) xpointer(id(chap1))">
The server may fall back to an XML file
The client tries fragment identifiers left-to-right, and uses the first one that works

Anatomy of a location step

Summary: axes and functions
root( ), id( )
parent, self, child
ancestor, ancestor-or-self
descendant, descendant-or-self
preceding-, following-sibling
preceding, following
attribute, namespace
here( ), origin( )
String-range(), range-to()

Counting locations

Points and Ranges
Point
What you get by click-selection
Gap before/after node or char
Range
What you get by drag-selection
From a start point to an end point
Not generally a WF XML subtree
May partially contain some elements:
<p>Hello, world.</p><p>Hi, back</p>
Crucial for creating hypertext links
How often do you click/drag exactly one entire element?

XLink is a language that...
Lets you invent your own linking elements and their meanings
In keeping with XML approach overall
Lets you create link databases
Links become first-class objects in the model
Provides some basic traversal behavior
E.g., “Open the target in a new window”
The rest is left to a style mechanism such as XSL

XLink terminology
Linking element
Identifies, connects, and describes anchors
Locator
Locateses some link end (anchor)’s data
Link end or anchor
A data portion reachable as part of a link
Arc
Explicit connection between two link ends
Resource
Anything you can point at on the Web
 Using an arc is called Traversal

What links do with link-ends
A link identifies where its ends are
Using some kind of locators
URI#XPointer will be the locator for XML
URI#scheme()scheme() in general
A link attaches metadata to each end
Its formal role in relation to the other ends
A title by which to refer to it (say, in menus)
Some traversal behaviors
Arcs to say which traversals happen
Link itself can also have type, other info

Inline links
Linking element itself (better, the origin() end) is one of the link’s ends

Out-of-line links
Linking element itself isn't automatically made into one of its own resources

 Anatomy of an XML link

Arcs
Arcs specify traversal rules
Multi-ended links may restrict travel among their endpoints
Restrictions generic or app-specific
Arcs enable the description of both
An arc is a pair of roles, plus metadata
Enables traversal between ends with the given roles
May be multiple locators per role (useful for document assembly, multiple-choice travel)

Arc example: fuel-type annotations

How to detect links
Could have any name and content at all
<footnote>, <criticism>, …
xlink:type attribute marks linking elements for applications to find:
  <!ELEMENT footnote EMPTY>
<!ATTLIST footnote
  xlink:type CDATA #FIXED "simple"
  xlink:href CDATA #REQUIRED>
For example: ...has studied the issue.<footnote href="http://www.doctools.com" />

Arcs and Traversals
Traversal is split into:
Behavior
Author's intention for behavior of a link.
Input to style mechanism
Not a presentation command
Actuation
Defines the event that triggers a link
Events are very generic, intentionally

Two kinds of behavior policies
show attribute
new to traverse and provide new “context”
replace to display in existing “context”
embed to display in the body of the initiating resource
Some semantic details are left unspecified: combining multiple ends, style inheritance, etc.
actuate attribute
onRequest to require external request
onLoad to traverse when link processed

Link databases let you…
Attach descriptive information from afar
Annotate other people's stuff
Maintain links more easily
When a destination changes, you don’t have to touch documents with links to it
Engage in online commerce in links
Express, package, and sell point-of-view
Collect out of line links as databases

External Linksets
Users will have persistent linkdbs
Subscriptions, interest groups, private,...
Document can specify relevant link dbs
Linked by special type of extended link
Included within regular documents too
LinkDBs enable link management
Needed to author using external links
Example: Public annotations on….

An external Linkset Instance
<xls>
<linkbase xlink:href="linkset1.xml" />
<linkbase xlink:href="linkset2.xml" />
<linkbase xlink:href="linkset3.xml" />
</xls>