Getting from Word to XML
 OUCS, July 2001

The easy way

Word 98

Word 98 sample

Word 2001

Word 2001 sample header

Word 2001 sample body

Word 2001 sample body

Other options

What's the real problem?

And this is bad because...

What should a solution do?

Sources of information

General approach

Implementation

Part 1: Stylify

Stylify: Finding exceptions

Part 3: dirSweep

Part 2: XMLify

The easy bits

A clever bit

The tedious bits

Headings

Lists

Tables

Special Word constructs

Pictures

Whitespace

Other tricks

Figuring what CSS to write

Not handled yet

Summary