HaXml
HaXml is a collection of utilities for using
Haskell and
XML
together. Its basic facilities include:
- a parser for XML,
- a separate error-correcting parser for HTML,
- an XML validator,
- pretty-printers for XML and HTML.
For processing XML documents, the following components are provided:
- Combinators is a combinator library for generic XML document
processing, including transformation, editing, and generation.
- Haskell2Xml is a replacement class for Haskell's Show/Read
classes: it allows you to read and write ordinary Haskell data as XML
documents. The DrIFT tool (available from
http://repetae.net/~john/computer/haskell/DrIFT/)
can automatically derive this class for you.
- DtdToHaskell is a tool for translating any valid XML DTD
into equivalent Haskell types.
- In conjunction with the Xml2Haskell class framework,
this allows you to generate, edit, and transform documents as normal
typed values in programs, and to read and write them as human-readable
XML documents.
- Finally, Xtract is a grep-like tool for XML documents,
loosely based on the XPath and XQL query languages. It can be used
either from the command-line, or within your own code as part of the
library.
Detailed documentation of the HaXml APIs
is generated automatically by Haddock directly from the source code.
An introduction to HaXml for people who know more about XML than
about Haskell can be found at
IBM DeveloperWorks.
A paper describing and comparing the generic Combinators with
the typed representation (DtdToHaskell/Xml2Haskell) is available here:
(12 pages of double-column A4)
Some additional info about using the various facilities is here:
Known problems:
- To use -package HaXml interactively with GHCi, you need
at least ghci-5.02.3.
- Haskell2Xml generates Parameter Entity Declarations in the internal
subset of the DTD, which don't conform to the strict well-formedness
conditions of XML. We think the constraint in question is spurious,
and any reasonable XML tool ought to deal adequately with full PEs.
- DtdToHaskell generates the Haskell String type for DTD attributes
that are of Tokenized or Notation Types in XML. This may not be
entirely accurate.
Current version:
HaXml-1.08, release date 2002.11.29
By HTTP:
.tar.gz,
.zip.
By FTP:
ftp://ftp.cs.york.ac.uk/pub/haskell/HaXml/
FreeBSD port:
http://freshports.org/textproc/haxml/
Older versions:
FreeBSD port
By FTP:
ftp://ftp.cs.york.ac.uk/pub/haskell/HaXml/
To install HaXml, you must have a Haskell compiler: ghc-5.02 or
later, nhc98-1.14 or later. You also need hmake-3.06 or later.
HaXml-1.08 does not currently support Hugs (but HaXml-1.05 does,
and we hope to re-introduce Hugs support again soon). Use
./configure
make
make install
to configure, build, and install HaXml as a package for your compiler.
Afterwards, to gain access to the HaXml libraries, you only need to
add the option -package HaXml to your compiler commandline.
Various stand-alone tools are also built - DtdToHaskell, Xtract,
Validate - if you are going to use them, you should copy these to
your final preferred installation location by hand.
The latest version (1.08) has the following fixes:
- A new and highly useful function, Text.XML.HaXml.Validate.partialValidate,
does validation except for checking whether the root element type matches
that of the DTD's root element. This is just what you need in order to
validate any partial document or sub-document.
- The function Text.XML.HaXml.Html.Generate.htmlprint had a little
bug which caused it to loop infinitely if some text was longer
than 60 characters without a space.
- The Xtract parser and combinators are now included in the HaXml
library package, rather than existing solely for the Xtract
command-line tool.
- The code generated by DtdToHaskell had a minor cut-and-paste error.
- The file `script/echo.c' was missing from the distribution.
This only affected the configuration step on systems where
`echo -n' is not recognised (e.g. Solaris).
Version (1.07) had the following new features:
- HaXml now uses the new hierarchical namespace for modules,
specifically under the tree Text.XML.HaXml.
- The HaXml libraries now install as a separate `package' in both ghc
and nhc98. Use -package HaXml to access them.
- The library APIs are now documented using Haddock.
- Due to popular request, we have added a new validator for checking
generic document content against a DTD. This is available both as
a library function, and as a command-line tool.
- DrIFT is now distributed separately by John Meacham, with much better
configuration and build support. You still only need it if you
want to derive the Haskell2Xml class.
- Bugfix: the lexer and parser now accept NOTATION declarations in the DTD.
- Bugfix: a PublicId in a NOTATION decl is now correctly recognised by the
keyword PUBLIC, not PUBLICID.
- Bugfix: the HTML parser now correctly accepts stand-alone tags like IMG.
- Bugfix: instances of XmlContent now accept an empty string where #PCDATA
is expected. Likewise, comments, processing instructions, and
character/entity references are now permitted to be scattered
thoughout some #PCDATA text.
- Bugfix: the OneOfN types used in code generated by DtdToHaskell are
now supplied by default up to size 20, and a utility for
automatically generating larger sizes is included.
Complete Changelog
We are interested in hearing your feedback on these XML facilities -
suggestions for improvements, comments, criticisms, bug reports. Please mail
Development of these XML libraries was originally funded by Canon
Research Europe Ltd.. Subsequent maintenance and development has
been partially supported by the EPSRC, and the University of York.
License: The library is Open Source, i.e., the bits we wrote
are copyright to us, but freely licensed for your use, modification,
and re-distribution, provided you don't restrict anyone else's use
of it. HaXml is distributed under the Artistic License - see file
LICENSE for more details. (If you don't like the
licensing conditions, please contact us to discuss your requirements.)
- Joe English has written a more space-efficient parser for XML
in Haskell, called hxml. What is more, it can be used as a simple
drop-in replacement for the HaXml parser!
Available here.
- Uwe Schmidt recently designed another
Haskell XML Toolbox
based on the ideas of HaXml and hxml.
- Some comparisons between functional language approaches to processing
XML can be found in
Bijan Parsia's article on xml.com
- Christian Lindig has written an XML parser in O'Caml:
here.
- Andreas Neumann of the University of Trier has written a
validating XML parser in Standard ML:
here.
- Erik Meijer and Mark Shields have a design for a functional programming
language that treats XML documents as basic data types:
XMLambda.
- Benjamin Pierce and Haruo Hosoya have a different but similar design in
XDuce, which is
also implemented.
- Ulf Wiger describes an Erlang toolkit for XML:
XMerL
- There is a comprehensive reading list for XML and web programming in
functional languages here.