This system organizes the documentation of Modula-3 packages. The documentation consists in HTML hypertext files and Modula-3 interfaces with embedded documentation. From this, a web of HTML files and a printable linear document are produced and shipped to the installation directory.
Document formats evolved from early inflexible systems to full programming languages embedded in a formatting engine (e.g. nroff, TeX). In parallel, less powerfull but visual incremental word processing formats appeared (e.g. wordstar, wordperfect, word). The benefits of separating the information content from the presentation parameters has been recognized early by several groups, notably publishers. From the same content, several different formats (TeX, RTF, HTML, ASCII, nroff, man) and presentations (book, technical report, two column article...) may be generated automatically.
This led to more structure oriented formats like SGML/XML [SGML] [XML], and to a certain extent LaTeX [LaTeX] . These systems may require more discipline, and thus took time to become popular. Their benefits were finally recognized, as illustrated by the widespread use of HTML [HTML] and Linuxdoc (both based on SGML), other SGML document types, and Thot [Thot].
Several existing SGML/XML document type definitions (DTD) target technical publications, including CALS, DOCBOOK, QUERTZ, LINUXDOC, and HTML. Each varies in terms of supported features (maths, tables, figures, index, bibliographic references...), supporting tools, and popularity. None supports all the above mentioned features; in particular, they all resort to foreign formats for figures. The most popular document type, HTML, has numerous supporting tools but lacks several features including maths, figures, index and bibliographic references. However, HTML is progressing in several ways and drafts for Math support as well as for generic extensions have been introduced. html sgml xml
Because of its widespread use, HTML was selected, with a number of extensions to cover the missing features. These extensions may be expressed using the generic DIV and SPAN tags for the most part; new elements are introduced for figures. The resulting DTD is HTML 4.0, plus MathHTML 97-07-10, minus the interactive features (forms, scripts, applets), plus a number of additions detailed in this document.
For writing and browsing purposes, it is useful to decompose a long technical reference manual into several small documents. Some of these small documents may be used in several contexts (as a section in a reference manual, as a subsection in a textbook, as a node in an hypertext web...). Furthermore, some portions of a reference manual may be generated automatically, for instance extracted from program interfaces. While most systems like Linuxdoc and LaTeXtoHTML do split up a large document into small html files, it may be more natural to assemble a large document from several small ones.
In the context of documenting Modula-3 programs and libraries, HTML files are combined, through hypertext links, with commented interface files (.i3). All the needed HTML and interface files reside in the src directory. From these, online (web of HTML files) and printed (Postscript) documentation is generated. For this purpose, Modula-3 interface files (.i3) are converted to HTML, and hypertext links to the interfaces are converted appropriately. To generate the printed documentation, some of the hypertext links are followed in a depth first search, starting from a root HTML file and including the content of referenced files, producing a large linearized HTML document. The linearized HTML document may then be converted to various formats, including LaTeX and Postscript.
Each Modula-3 interface (or implementation) containing user documentation is declared using the HtmlInterface(x) or HtmlGenericInterface(x) functions (or HtmlImplementation(x) and HtmlGenericImplementation(x)). File x.i3, or x.ig, is converted to HTML using m3tosgml -html, to be exported to the Modula-3 HTML installation directory. The conventions for embedding the documentation in the interface comments are detailed later in this document.
Each HTML file is declared using the HtmlFile(x) function. File x.html is filtered to redirect references to .i3 files to the corresponding .html files, and to remove any non standard HTML extension used.
Finally, one or more printed documents may be produced by declaring their root node with the HtmlRoot(x) function. File x.html is used as a start point for sgmllinear, producing a single HTML file from x.html and the files referenced through hypertext links. The resulting linearized HTML file is then converted to LaTeX, formatted with latex and converted to Postscript, to be exported to the Modula-3 DOC installation directory.
import("libm3") import("m3doc") Module("Schema") Implementation("Main") HtmlFile("index") HtmlFile("intro") HtmlFile("details") HtmlInterface("Schema") HtmlRoot("index") Program("prog")
The printed linear document is generated by parsing the root html file and including all referenced elements (e.g.: HREF=/path/document#elementName), and the BODY of all referenced documents (e.g. HREF=/path/document), through hyperlinks with the attribute REL="SUBDOCUMENT" or REL="INCLUDE". Thus, the HEAD of the root document is retained, and the HEAD of included documents is discarded. A SUBDOCUMENT file is assumed to be nested within the current section, while an INCLUDE document is at the end of, and at the same level as, the current section. The depths of headers within the referenced document are updated accordingly in the linearized output.
In order to highlight the links to the documents to include, a number of elements are often present, typically an unordered list (UL and LI), which are to be skipped in the linearized document. Such elements are tagged with the attribute CLASS=MD.SKIP; this element and any nested element is skipped, except for referenced documents which may be included. Additionally, the attribute CLASS=MD.SKIPALL is used to skip entirely an element and anything nested.
The HEAD and ABSTRACT sections (DIV elements with CLASS=ABSTRACT, or CLASS=HEAD) of included documents are skipped, leaving only the HEAD and ABSTACT sections of the root document. The appendix sections (within a DIV element with CLASS=APPENDIX) of the root document and of included documents are accumulated, in the order encountered during the linearization, and moved to the end of the BODY section. The header depths of the appendices in included documents are not adjusted, since they are moved to the BODY or the root element. The bibliographic references encountered in the whole document are accumulated and inserted at the end of the document, in a DIV element with CLASS=REFERENCES.
All identifiers (attributes NAME and ID) for elements in merged documents are prepended the document path, to insure unicity within the merged linear document. All hypertext references to merged elements are converted as needed from external references to internal references.
The document type definition (DTD) follows the HTML 4.0 draft, minus all the interactive features (forms, scripts, applets), plus MathHTML draft 97-07-10 (e.g. for browers supporting MathHTML: and a number of extensions. Even though these drafts are evolving, the processing tools are modular and can easily be adapted. Moreover, with a well structured content describing format, it is relatively easy to migrate documents to a different expression of the structure (i.e. different tag names). The extensions are mostly accomodated within the existing tags and attributes. They are detailed in the following sections. html math
The following elements must be present in root HTML files used to generate printed documentation. They are ignored in included documents. title page author date copyright keyword
<DIV CLASS=HEAD> <SPAN CLASS="AUTHOR">John Doe<A HREF="#adr1" REL="AFFILIATION"></A></SPAN> <SPAN CLASS="AUTHOR">Mary Doe<A HREF="#adr2" REL="AFFILIATION"></A></SPAN> <ADDRESS ID="adr1"> Newsletter editor<BR> 8723 Buena Vista, Smallville, CT 01234<BR> Tel: +1 (123) 456 7890<BR> email: jd@sgml.com </ADDRESS> <ADDRESS ID="adr2"> Newsletter editor<BR> 8000 Buena Vista, Bigville, VT 01234<BR> Tel: +1 (123) 456 7891<BR> email: md@sgml.com </ADDRESS> <SPAN CLASS="COPYRIGHT">Copyright 1997 under the General Public License, see file COPYING.</SPAN> <SPAN CLASS="DATE">23 Jan 1997 16:05:31 GMT</SPAN> <SPAN CLASS="KEYWORD">beta</SPAN> <SPAN CLASS="KEYWORD">text</SPAN> <SPAN CLASS="KEYWORD">SGML</SPAN> </DIV>
The document structure is based on nested sections, each section starting with a title. In HTML, the nesting of sections is implicit and deduced from the heading level. Besides the core sections, there are the abstract and the appendices which need to be suitably identified. The abstract may be used as a summary for the document. The appendices may be grouped, numbered separately, and placed at the end when several HTML files are grouped during a linearization. abstract appendix
<HEAD><TITLE>This is the title but is redundant</TITLE> </HEAD><BODY> <H1>This is the title for real</H1> <P> First paragraph </P> <DIV CLASS=ABSTRACT> <P>A real short abstract. </P> </DIV> <H2>Introduction</H2> <P>This is all about gnus and gnats as usual. </P> <H2>Motivation</H2> <H3>History</H3> <P> Once upon a time... </P> <DIV CLASS=APPENDIX> <H2>Biological characteristics of gnus</H2> ... <H2>Biological characteristics of gnats</H2> ... </DIV>
In an hypertext document, references may point to an html file on the Web, to a bibliographic entry for the referenced document, or to both. The following classes are used to distinguish the three cases, and their treatment is described. reference entry hypertext link
The referenced bibliographic entries must use the following classes, with the specified fields, inspired from the LaTeX bibtex entries.
The document then contains a number of references, some of which point to entries, usually in a separate bibliographic database.
Document.html: The entries are based on the bibtex/LaTeX <A REL="BIB.NOREF" HREF="http://www.latex.org/manual.html">[LaTeX]</A> <A REL="BIB.ENTRY" HREF="../bib/BibEntries.html#lamport1985"></A>, a popular typesetting system for gnus and gnats. BibEntries.html: <DIV ID="Doe1997" CLASS=BIB.ARTICLE> <SPAN CLASS=AUTHOR>John Doe</SPAN> <SPAN CLASS=AUTHOR>Mary Doe</SPAN> <SPAN CLASS=TITLE>Gnus and Gnats</SPAN> <SPAN CLASS=JOURNAL>Software Review</SPAN> <SPAN CLASS=YEAR>1997</SPAN> </DIV> <DIV ID="lamport1985" CLASS=BIB.BOOK> <SPAN CLASS=AUTHOR>Leslie Lamport</SPAN> <SPAN CLASS=TITLE>LaTeX User's Guide and Reference Manual</SPAN> <SPAN CLASS=PUBLISHER>Addison-Wesley</SPAN> <SPAN CLASS=ADDRESS>Reading, Massachusetts</SPAN> <SPAN CLASS=YEAR>1985</SPAN> </DIV>
Bibliographical references are used to access material outside of the current document. Internal references point the reader to a section, table, figure... through its number (by default and when REL=REF.NUMBER), or page (when REL=REF.PAGE). The corresponding target (section, table, figure...) must be named with the ID or NAME attribute. references internal
The evolution of gnu populations are shown in Table <A REL=REF.NUMBER HREF="#gnutable">[gnu table]</A>, on page <A REL=REF.PAGE HREF="#gnutable">[]</A>, ... <TABLE ID="gnutable"> <CAPTION> Gnu populations in North America from 1800 to 1900</CAPTION> <TR>... </TABLE>
Index marks are used to collect for a document the list of pages where an important topic is discussed. In some cases, begin and end marks delineate a section where the topic is discussed, and the corresponding page range appears in the index. The term printed in the index may not be the correct key for sorting. Indeed, terms may start with a capital letter, or be emphasized to indicate the that the term is first defined here. index
A simple index mark is indicated by a SPAN element of CLASS INDEX.MARK. It contains a list of usually no more than three SPAN elements of class INDEX.KEY, each possibly followed by a SPAN element of CLASS INDEX.TEXT when the text to print differs from the sorting key. It may end with a list of SPAN elements of CLASS INDEX.SEE to refer to another index item.
When a text range is to be delineated for the index, two index marks are used, one with CLASS INDEX.MARK.BEGIN and the other with CLASS INDEX.MARK.END. These may not contain INDEX.SEE elements, and the contained INDEX.KEY and INDEX.TEXT elements must match those in the corresponding begin/end mark.
This section discusses the Gnu population size variations over time and geographical area. <SPAN CLASS=INDEX.MARK> <SPAN CLASS=INDEX.KEY>gnu</SPAN> <SPAN CLASS=INDEX.TEXT><EM>Gnu</EM></SPAN> <!-- Gnu is emphasized because first defined here --> <SPAN CLASS=INDEX.KEY>population</SPAN> <SPAN CLASS=INDEX.TEXT>Population</SPAN> <SPAN CLASS=INDEX.KEY>size</SPAN> </SPAN> <SPAN CLASS=INDEX.MARK> <SPAN CLASS=INDEX.KEY>gnu</SPAN> <SPAN CLASS=INDEX.TEXT><EM>Gnu</EM></SPAN> <SPAN CLASS=INDEX.KEY>population</SPAN> <SPAN CLASS=INDEX.TEXT>Population</SPAN> <SPAN CLASS=INDEX.KEY>growth</SPAN> <SPAN CLASS=INDEX.SEE>gnu</SPAN> <SPAN CLASS=INDEX.SEE>population</SPAN> <SPAN CLASS=INDEX.SEE>size</SPAN> </SPAN> Various causes affect the Gnu population size. <SPAN CLASS=INDEX.MARK.BEGIN> <SPAN CLASS=INDEX.KEY>gnat</SPAN> <SPAN CLASS=INDEX.TEXT><EM>Gnat</EM></SPAN> </SPAN> Most significantly, the presence of gnat has a direct correlation with gnu health problems. <SPAN CLASS=INDEX.MARK.END> <SPAN CLASS=INDEX.KEY>gnat</SPAN> </SPAN>
The current HTML practice is to use bitmaps for anything graphical; in some cases, a vector format like Postscript or CGM is used. By representing diagrams as structured elements, one allows further editing and reuse of the diagrams, insures that the diagrams can be represented at the full resolution of the output device, and allows spell checking and cut/paste of any text within the diagram. figure drawing postscript picture
The following elements are introduced for representing figures. All positions and sizes are floating point numbers in points.
Each attribute is detailed below.
<FIGURE WIDTH=400 HEIGHT=600> <CAPTION>Schema of cultural exchanges between gnus and gnats </CAPTION> <RECTANGLE HPOS=100 VPOS=100 WIDTH=200 HEIGHT=150 FILLCOLOR="yellow"/> <CIRCLE HPOS=200 VPOS=200 RADIUS=100 FILLCOLOR="pink"/> <ELLIPSE HPOS=200 VPOS=400 WIDTH=200 HEIGHT=100 FILLCOLOR="red"/> <POLYLINE CLOSED PENCOLOR="black" DEPTH=2 PENWIDTH=2.5 POINTS="100.0 100.0 300 100 300 200 100 200"/> <SPLINE CLOSED FILLCOLOR="magenta" POINTS="10 10 20 10 20 20 10 20"/> <PICTURE HPOS=200 VPOS=400 HEIGHT=100 WIDTH=100 SRC="gnu.ppm" ALT="A gnu in sunset"/> <ARC P1="200 200" P2="200 300" P3="300 300" CAP="BUTT" BARROWTYPE=DIAMOND BARROWSIZE="0 30 60"/> <GTEXT VPOS=300 HPOS=200 HEIGHT=300 WIDTH=200> <P>Gnus sometimes have difficult cultural exchanges with gnats, as demonstrated by the surrounding <EM>diagram.</EM> </GTEXT> <GGROUP VPOS=300 HPOS=200 TRANSFORM="0 .707 .707 0 0 0"> <ELLIPSE FILLCOLOR="black" HPOS=0 VPOS=0 WIDTH=200 HEIGHT=100/> </GGROUP> </FIGURE>
The advantages of literate programming (embedding documentation in the source code) are that the documentation is available both in the source code and in the online or printed documents, and that chances are greater that updates to the source code and the documentation are synchronized.
Some structuring conventions are required to embed all the documentation information within the comments of the source code. They are relatively simple and are detailed below.
All lines up to and including the first blank line are put into an HTML comment. This makes it easy to omit the source file copyright notice from the documentation. The first blank line must not be within a comment.
The portions of the file that are not in comments will be placed within preformatted (<PRE>) sections, obeying line breaks and indentation. Any characters that have special significance to HTML will be quoted.
The portions of the file that are in comments will be treated differently depending on the position of the comment. There are three cases:
PROCEDURE P(VAR (*OUT*) x: INTEGER);
PROCEDURE LineTo(path: T; READONLY p: Point.T); (* Extend "path" with a linear segment that starts at its current point and ends at "p". *)
It is recommended that these slanted comments be kept brief; additional details can be provided in longer type 3 comments.
Sometimes you want a portion of a comment to be treated verbatim, like program text, for example, when including an example program. Therefore, any line that starts with "| " (vertical bar followed by space) is treated like program text, typesetting it verbatim in typewriter font.
Comments of type 2 and 3 may contain HTML tags, for example to annotate section headings or emphasize text. A title and level 1 header are supplied using the interface name.
Lines in type-three comments that begin with "| " are treated just like program text with one exception: you can slip in words in roman font by enclosing them in back-quotes.
| ELSIF s `was generated by` LineTo THEN
This document presents a simple, structured, standard based, documentation format. It supports all the common constructs available in traditional systems such as LaTeX (math, table, bibliography, index) as well as figures (diagrams). This structured format insures portability, reusability and easy maintenance/transformation. A large part of documents (excluding the extensions such as index marks and figures) may be entered using the widely available HTML editors (emacs HTML mode, Amaya, Netscape, Word...).