4 Writing Documents

4.1 Getting Started with XML

The Extensible Markup Language is not a data description language in itself. Rather, it defines a syntax that lets you design your own customized markup languages for arbitrary data models. Take a look at the following example:

<?xml version="1.0" encoding="UTF-8"?>
<cocktail alcoholic="yes">
    <name>Pina Colada</name>
    <ingredient>
        <name>rum</name>
        <amount unit="oz">3</amount>
    </ingredient>
    <ingredient>
        <name>coconut milk</name>
        <amount unit="tbsp">3</amount>
    </ingredient>
    <ingredient>
        <name>pineapple</name>
        <amount unit="tbsp">3</amount>
    </ingredient>
    <ingredient>
        <name>ice</name>
        <amount unit="cup">2</amount>
    </ingredient>
</cocktail>

What you see is a complete XML document describing the ingredients needed to make the popular Pina Colada cocktail. On the first line, you see the XML declaration. It specifies the XML version and the character encoding of the document. In general, you should use a unicode character encoding such as UTF-8 or UTF-16, since any standards conforming XML parser is required to be able to read these.

What follows is the root element cocktail that contains a name element, telling us the name of the cocktail, followed by the various ingredients that you need to make a Pina Colada.

The textual elements delimiting the beginning and end of an XML element are called “tags”. An opening tag has the form <tag_name> and the corresponding closing tag is written </tag_name>.

XML elements must always be properly nested. In addition, there must be only one root element. Thus, you can picture the logical structure of an XML document as a tree. A single, bare element of this tree is called a “node”. Its direct descendants are called “children” and the node from which it originates the “parent”.

Nodes may carry additional attributes. Our cocktail from above, for instance, has the attribute alcoholic which in this case is set appropriately to “yes”.

The names of elements and attributes can be chosen arbitrarily to represent a given data model. In our example, we tried to model a beverage but you might just as well define a set of tags to describe the parts of an automotive vehicle.

When working with eCromedos, you will be using XML-based markup to describe the logical structure of standard text documents.

4.2 Available Document Classes

In version 1.0, eCromedos defines three document classes: report, book and article. The difference between these is mainly cosmetic and only visible in printed output. Their definition is formally layed down in a set of Document Type Definitions (DTD), which the document processor uses to verify the correct structure of documents before attempting to transform them.

Take a look at the following listing for an example of a simple book in eCromedos Markup-Language:

<book lang="english" secsplitdepth="1" secnumdepth="1" tocdepth="1">
    <head>
        <subject>Subject</subject>
        <title>Document Title</title>
        <author>Document Author</author>
        <date>Jan. 16, 1980</date>
        <publisher>Example Publisher</publisher>
    </head>
    <chapter>
        <title>My very First Document</title>
        <p>
            Hi Everybody!
        </p>
    </chapter>
</book>

Documents always have a head, regardless of the employed document class. In contrast to HTML, the order of the header elements is not arbitrary. The elements title and author are mandatory and you may specify multiple authors.

As you can see, our book has a chapter with a single paragraph of text. A paragraph is the simplest textual element that may occur inside a section.

A report is essentially the same as a book, except that books are layed out double-sided with uneven margins and reports are layed out one-sided with even margins.

Articles differ from books and reports in that the primary sectioning element is section instead of chapter. Furthermore, sections in an article are printed directly in sequence, whereas in books and reports a new chapter will always start a new page.

4.3 Structuring Documents

In general, you will be using the sectioning elements chapter, section, subsection and subsubsection to structure your documents.

Sectioning elements must be given a title and they must be nested hierarchically correct, i.e. you cannot have a chapter in a section and you cannot have a subsection in a chapter without first opening a section.

4.3.1 Minisections

Minisections are set with the minisection tag. They may appear anywhere in the section hierarchy below the primary sectioning element for the particular document class. The title of a minisection will not be numbered and will not receive an entry in the table of contents.

4.3.2 Prefaces

In books and reports you may use the preface element to set an abitrary number of prefaces right after the document head. The title of a preface will not be numbered and will not appear in the table of contents (TOC) when generating printed output. However, it will receive an entry in the TOC when generating HTML.

A preface may contain paragraphs of text, as well as block elements, such as figures and tables. It must not contain any deeper sections. If you feel, you need to section your preface, you should probably make it a chapter.

4.3.3 Appendices

An appendix is essentially the same as a chapter. Only the numbering will be different in that the first part of the section counter will be a latin letter instead of an arabic number. Appendices may occur only in document classes book and report. They are to be placed right after the last primary section of a document.

4.3.4 Glossaries

A glossary can be placed after the last regular section, which is, depending on the document class, either the last chapter, the last section or the last appendix. A glossary is basically an extra section that must contain nothing but a definition list (see section 5.1).

At this time, eCromedos does not provide functionality for creating and sorting glossaries automatically. This is due to the complexity of implementing this for arbitrary languages. An interface to xindy, the flexible index generator (see [3]) is planned for the future.

4.3.5 Bibliographies

Bibliographies are entered with the biblio tag and individual entries with bibitem. A bibliography may occur only after the glossary, if theres is one, or after the last section in the document, otherwise. Currently, eCromedos does not support bibliographies after individual sections. Here is an example:

<biblio number="yes">
    <bibitem label="KOCH06">
        Tobias Koch. eCromedos User Manual.
        <tt>http://www.ecromedos.org</tt>,
        2006.
    </bibitem>
    <bibitem label="WALSH03">
        Norman Walsh, Leonard Muellner.
        DocBook: The Definitive Guide.
        OReilly, 2003.
    </bibitem>
</biblio>

The number attribute is there to control, whether the individual items should be sequentially numbered or if the user-supplied labels should be used.

In the main part of your document, you can use the cite tag to cite an entry from the bibliography. For example, in allusion to the listing above, you could write <cite label="KOCH06"/>, which the document processor would replace with “[1]” when numbering is turned on and “[KOCH06]” when numbering is off.

4.4 Formatting Text

From your word processor you may be used to being able to emphasize text by setting it in bold or italic letters or by underlining it. With eCromedos you can achieve this, by enclosing the span of text to be formatted inside the tags b for bold print, i for italic letters or u for underlining. You may also combine these arbitrarily.

Sometimes, you may want to set certain terms or expressions, such as internet addresses, in a font with fixed character width. To this end, there is the tt tag, which prints text in typewriter letters.

Examples of Formatting Text
<u>Underlined text</u> Underlined text
<i>Italicized text</i> Italicized text
<b>Bold letters</b> Bold letters
<b><i>Bold face and italics</i></b> Bold face and italics
<tt>Typewriter letters</tt> Typewriter letters

For the sake of completeness, there are also six elements for modifying the font size. In a serious document you should hardly have any reason to use these, though.

Examples of Modifying the Font Size
<xx-small>Text in XXS</xx-small> Text in XXS
<x-small>Text in XS</x-small> Text in XS
<small>Small letters</small> Small letters
<medium>Regular size</medium> Regular size
<large>Large letters</large> Large letters
<x-large>Text in XL</x-large> Text in XL
<xx-large>Text in XXL</xx-large> Text in XXL

4.5 Hyphenation

In printed output text is set justified over the entire width of the text body. In order to avoid large gaps between words on single lines, LATEX applies language specifc patterns to automatically hyphenate and break words on the right margin border.

Unfortunately, LATEX's hyphenation mechanism is not always able to split words correctly and in rare cases cannot hyphenate certain words, at all.

You can provide hints telling LATEX in which places a given word may be split, by marking the corresponding spots with the y tag. For example, to tell LATEX that it may hyphenate “bibliography” only in between “biblio” and “graphy” you would write biblio<y/>graphy in your markup.

4.6 Line and Page Breaks

In general, you should not worry about where a line breaks or where to start a new page, because it is the job of the formatting engine (i.e. LATEX or your web browser) to take care of this.

In rare cases, however, you may have to intervene manually. You can use <br/> to break the current line and <pagebreak/> to start a new page. You should not use multiple brs or multiple pagebreaks in a row. Of course, a pagebreak is only visible in printed output.

When you need to prevent linebreaks in certain places, you can either use the non-breaking space (&nbsp;) or protect the specific strip of text with the nobr tag. For example, a title or degree should not be separated from the name that follows it. Consequently, you should write Dr.&nbsp;Pepper or <nobr>Dr. Pepper</nobr> to prevent the formatting engine from possibly breaking the line right before Pepper.

4.7 Cross-References

Sometimes you will want to refer to the contents of a different section in your manuscript, i.e. you may write something like “[...] you will find out more about this on page XYZ”. However, at the time of writing your markup, you cannot tell on which page the section you are referring to will actually be printed. The solution is to label the location you wish to reference and let eCromedos do the math.

To label a certain spot in your text, you use the label tag. This tag has a single, mandatory attribute, that is the name of the label. This must be a unique identifier among all labels in your document. Take a look at the following example:

<chapter>
    <title>The Show about Nothing</title>
    <p>
        Seinfeld<label name="seinfeld"> is the best
        sitcom of all times.
    </p>
</chapter>

You can now use the elements ref to obtain the section number and pageref to get the page number like this:

<chapter>
    <title>About Myself</title>
    <p>
        I really enjoy watching Seinfeld. You can read more
        about Seinfeld in section <ref name="seinfeld"/> on
        page <pageref name="seinfeld"/>.
    </p>
</chapter>

ref and pageref can also point to any other object with a label, such as a figure or a numbered equation. In that case ref will resolve to the corresponding object counter instead of the section counter.

4.8 Marginals and Footnotes

Marginal notes can be placed with the marginal tag. And yes, they also work in HTML output. Try this example:

<p>
    In this episode<marginal>The Summer of George</marginal>,
    George finally looses his job at the Yankee Stadium but
    gets an extra three months' pay-off.
</p>

LATEX does not allow marginals in table cells. For HTML output this limitation does not exist.

Footnotes are placed in the same fashion by use of the footnote tag. They do work inside tables without restrictions.

4.9 Quoting

Unless you are setting your text in typewriter letters, you will not be able to enter the correct quotation marks for your language directly with your keyboard. You could use XML character entities to access the glyphs, but that is tedious. Instead you should use the q and qq tags for single and double quoting, respectively.

4.10 Predefined Entities

eCromedos predefines a number of entity names that may come in handy in certain situations. The following table lists all available names and how they are resolved:

Entity Description
&tex; Resolves to “TEX”
&latex; Resolves to “LATEX”
&nbsp; The non-breaking space
&zwsp; The zero-width space
&endash; The en-dash (–);
&emdash; The em-dash (—);
&dots; Resolves to “...”

Note The zero-width space is particularly useful for making long path names or Internet addresses break across lines without introducing hyphens or spaces.

In order to use these, you have to include a document type declaration at the top of your document, appropriate to the document class you are using. This might look like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE report SYSTEM "http://www.ecromedos.net/dtd/1.0/report.dtd">

You can also insert these entities indirectly via the entity element, in which case you don't need the document type declaration.