This library reads SGML document types definitions and parses SGML files accordingly. It may be used for parsing, checking, or converting SGML files.
The Standard Generalized Markup Language (SGML) is a structured document representation suitable for document manipulation, exchange and storage [SGML]. Among other things SGML is the format upon which is based HTML, the World Wide Web Hypertext Markup Language [HTML].
A first version of this library was built on top of James Clark NSGMLS C++ SGML parser, a fully validating and stable parser. Subsequent versions may still use the NSGMLS parser, if the quake variable USE_NSGMLS is true in the m3makefile when compiling the library, but default to a native Modula-3 XML/SGML parser. XML is a subset of SGML being defined by the World Wide Web consortium [XML].
The native Modula-3 library was developed with the goal of removing dependencies on foreign packages (NSGMLS) which needed to be installed separately. Implementing a fully validating SGML parser, however, is a significant task (NSGMLS weights in at 50000 lines). Thus, a subset of SGML, based on XML, was implemented; several secondary options are not implemented. Nonetheless, the implementation goes beyond XML since the HTML 3.2/4.0 document type definitions needed to be handled, as well as common HTML documents. Among other things, tags validation was required in order to detect and supply omitted tags.
Therefore, while the native Modula-3 SGML parser backend still has some limitations, a large part of the needed underlying machinery has been implemented. It has been relatively easy to extend as more features were required. It is expected to continue evolving as it gains more widespread use and as the XML standard matures.
The default path for catalogs is PKG_INSTALL/sgml/src/dtd, where PKG_INSTALL is defined in the m3build template (e.g. /usr/local/modula3-3.6/lib/m3/pkg) for the Modula-3 parser. For NSGMLS, the default value depends on how it was compiled. Document type definitions are provided in this package for HTML 3.2, HTML 4.0, Math HTML, and Linuxdoc.