Computer Science 380:

Principles of Database Systems

Chapter 23

Gregory M. Kapfhammer


creative commons licensed ( BY-NC-SA ) flickr photo shared by danmachold

Motivation for XML's Creation

Data Formats

Structured

Semi-structured

Unstructured

Where does XML fit?

Markup and Content

XML and HTML

Trade-Offs

XML is for data exchange

HTML is for document display

Markup makes the content easier to understand ...

But, markup adds to the space overhead of the data!

Essentially, markup is like a repeated schema!


creative commons licensed ( BY-NC-SA ) flickr photo shared by DryHeatPanzer

Benefits of XML

Self-documenting

Non-rigid document format

Support for nested structures

Widely accepted in many domains

See Figures 23.1 through 23.3 for examples!

Document Type Definition

State the type ...

... and then perform type checking!

DTD Problems

Cannot type check text elements and attributes

Cannot specify that subelements appear only once

Cannot state the destination of an ID or IDREF

What should be do?

XML Schema

Handle all of the deficiencies of the DTD

Give options for minOccurs and maxOccurs

Define data that exists as a sequence of values

Define a namespace to avoid conflicts

Create complex types much like you can in Java!

You can even define primary and foreign keys!

Querying XML

Xpath: path expressions are building blocks

XQuery: standard language for XML querying

XSLT: supports transformation and formatting

We can even write our own programs in Java!

FLWOR Expressions

XML Parsing

DOM: Document Object Model

SAX: Simple API for XML

The most significant ways to parse XML in Java!

DOM supports parsing with a tree model

SAX triggers events during parsing

XML Storage

File system

Relational database

XML-aware database

What are the trade-offs?

XML Compression

What tools are available?

What are the empirical trade-offs?

Unusual characteristics of XML compression

You can try this using open source tools!

XML Applications

Storing data with complex structure

Standardizing data exchange formats

Web services

Data mediation

Debugging and testing

Reviewing XML Trade-offs