Home
  XMark Home
  CWI Amsterdam

 Site
  Introduction
  People
  Data Generator
  Downloads
  Publications
  Data Generation FAQ
  Contact
  Disclaimer

 Links
  CWI Amsterdam
  INRIA
  Microsoft Inc.
  BEA Systems
 
XMark — An XML Benchmark Project

xmlgen - The Benchmark Data Generator


xmlgen produces XML documents modeling an auction website, a typical e-commerce application. The high-lights of the data generation are:
  • Generation of well-formed, valid, and meaningful XML data.
  • Efficient, scalable generation of XML documents the size of several GBytes.
  • Observing of referential constraints concerning ID/IDREF pairs.
  • Low, constant memory requirements, independent of the size of the generated document.
Number and type of elements are chosen according to a template and parameterized with certain probability distributions. The words for textparagraphs are taken from Shakespeare's plays.

The design assures reproducibility across platforms (marginal differences in documents may result from round-off errors though). Moreover, the characteristics of a document are fully preserved under scaling, aiding the analysis of bottlenecks and how they evolve with increasing data volume.

In the design of xmlgen, we deliberately reduced the number of parameters to only a single one: the size of the document. We believe that the diverse structure of the document captures all important features found in typical XML documents.

© CWI 2002, 2003 — page maintained by Albrecht Schmidt — $Date: 2003/06/28 18:21:17 $