Writing Edit Modes

New syntax highlighting modes are very easy to create. This section will start with a short XML primer, followed by detailed information about each supported highlighting rule.

An XML Primer

XML is a simplified version of SGML, which is a language closely related to HTML. Because of this, XML looks very similar to HTML and anyone who's done a bit of web page authoring will immediately pick it up.

Here is a sample XML file that also happens to be a definition for a (useless) edit mode called "my-mode":

<?xml version="1.0"?>

<!DOCTYPE MODE SYSTEM "xmode.dtd">

<MODE NAME="my-mode">
    <PROPS>
        <PROPERTY NAME="label" VALUE="My First Edit Mode" />
        <PROPERTY NAME="filenameGlob" VALUE="*.my" />
    </PROPS>
    <RULES>
        <!-- syntax highlighting rules go here -->
    </RULES>
</MODE>

If you're familiar with HTML, keep the following in mind when writing XML:

To insert a special character such as < or > literally in XML (for example, inside an attribute value), you must write it as an entity. An entity consists of the character's symbolic name enclosed with "&" and ";". A full list of entities is out of the scope of this chapter, but the most important are:

For example, to highlight "<" as an OPERATOR in Java mode, you can't write "<SEQ TYPE="OPERATOR">< </SEQ>" because that would cause a syntax error. Instead, you must write:

<SEQ TYPE="OPERATOR">&lt;</SEQ>

Now that you know the basics of XML, read on to find out how to write edit modes.

The Preamble

Each mode definition must start with the following:

<?xml version="1.0"?>
<!DOCTYPE MODE SYSTEM "xmode.dtd">

The MODE Tag

Each mode definition must contain at least one MODE tag. All other tags (PROPS, RULES) must be placed inside the MODE tag. The mode tag has one required attribute, NAME. It must be set to the edit mode's name. The MODE tag for Java mode looks as follows:

<MODE NAME="java">
    <!- - definition for Java mode goes here - ->
</MODE>

The PROPS Tag

The PROPS tag and the PROPERTY tags inside it are used to define mode-specific properties. Each PROPERTY tag must have a NAME attribute set to the property's name, and a VALUE attribute with the property's value.

In addition to the properties listed in the section called Buffer-Local Properties in Chapter 9, you can use the following properties in modes:

The RULES Tag

RULES tags must be placed inside the MODE tag. Each RULES tag defines a ruleset. A ruleset consists of a number of parser rules, with each parser rule specifying how to highlight a specific syntax token. There must be at least one ruleset in each edit mode. There can also be more than one, with different rulesets being used to highlight different parts of a buffer (for example, in HTML mode, different rulesets are used to highlight tags and inline JavaScript). For information about using more than one ruleset, see the section called SPAN rule.

The RULES tag supports the following attributes, all of which are optional:

Each child element of the RULES tag defines a parser rule. Rules are checked in order; that means that if you define a rule that matches on "h" and another subsequent rule that matches on "hello", the "h" rule will handle all cases before the "hello" rule gets a chance. For the ruleset to work correctly, you must instead place the "hello" rule before the "h" one.

Here is an example RULES tag:

<RULES IGNORE_CASE="FALSE" HIGHLIGHT_DIGITS="TRUE">
    ...
</RULES>

TERMINATE rule

There can only be one TERMINATE tag per ruleset. The TERMINATE rule specifies that parsing should stop after the specified number of characters have been read from a line. The number of characters to terminate after can be specified with the AT_CHAR attribute. This is used in patch mode, for example, because only the first character of each line affects highlighting. Here is an example:

<TERMINATE AT_CHAR="1" />

WHITESPACE rule

The WHITESPACE rule specifies characters which are to be treated as whitespace; in other words, keyword delimiters. Most rulesets will have WHITESPACE tags for spaces and tabs. Here is an example:

<WHITESPACE> </WHITESPACE>
<WHITESPACE>        </WHITESPACE>

SPAN rule

The SPAN rule highlights ranges of text between a start and end string. The start and end strings are specified inside child elements of the SPAN tag, like so:

<SPAN TYPE="COMMENT1">
    <BEGIN>/*</BEGIN>
    <END>*/</END>
</SPAN>

The following attributes are supported:

Here is a SPAN that highlights Java string literals, which cannot include line breaks:

<SPAN TYPE="LITERAL1" NO_LINE_BREAK="TRUE">
    <BEGIN>"</BEGIN>
    <END>"</END>
</SPAN>

Here is a SPAN that highlights Java documentation comments by delegating to the "JAVADOC" ruleset defined elsewhere in the current mode:

<SPAN TYPE="COMMENT2" DELEGATE="JAVADOC">
    <BEGIN>/**</BEGIN>
    <END>*/</END>
</SPAN>

Here is a SPAN that highlights HTML cascading stylesheets inside STYLE tags by delegating to the CSS ruleset in another mode:

<SPAN TYPE="MARKUP" DELEGATE="css::MAIN">
    <BEGIN>&lt;style&gt;</BEGIN>
    <END>&lt;/style&gt;</END>
</SPAN>

EOL_SPAN rule

An EOL_SPAN is similar to a SPAN except that highlighting stops at the end of the line, not after the end sequence is found. The text to match is specified between the opening and closing EOL_SPAN tags. The following attributes are supported:

Here is an EOL_SPAN that highlights C++-style comments:

<EOL_SPAN TYPE="COMMENT1">//</EOL_SPAN>

MARK_PREVIOUS rule

The MARK_PREVIOUS rule highlights from the end of the previous syntax token to the matched text. The text to match is specified between opening and closing MARK_PREVIOUS tags. The following attributes are supported:

Here is a rule that highlights labels in Java mode (eg, "XXX:"):

<MARK_PREVIOUS AT_LINE_START="TRUE"
    EXCLUDE_MATCH="TRUE">:</MARK_PREVIOUS>

MARK_FOLLOWING rule

The MARK_FOLLOWING rule highlights from the start of the match to the next syntax token or white space. The text to match is specified between opening and closing MARK_FOLLOWING tags. The following attributes are supported:

Here is a rule that highlights variables in Unix shell scripts ("$CLASSPATH", "$IFS", etc):

<MARK_FOLLOWING TYPE="KEYWORD2">$</MARK_FOLLOWING>

SEQ rule

The SEQ rule highlights fixed sequences of text. The text to highlight is specified between opening and closing SEQ tags. The following attributes are supported:

The following SEQs highlight a few Java operators:

<SEQ TYPE="OPERATOR">+</SEQ>
<SEQ TYPE="OPERATOR">-</SEQ>
<SEQ TYPE="OPERATOR">*</SEQ>
<SEQ TYPE="OPERATOR">/</SEQ>

KEYWORDS rule

There can only be one KEYWORDS tag per ruleset. The KEYWORDS rule allows you to define keywords to highlight. Keywords are similar to SEQs, except that SEQs match anywhere in the text, whereas keywords only match whole words.

The KEYWORDS tag supports only one attribute, IGNORE_CASE. If set to FALSE, keywords will be case sensitive. Otherwise, case will not matter. Default is TRUE.

Each child element of the KEYWORDS tag should be named after the desired token type, with the keyword text between the start and end tags. For example, to highlight the most common Java keywords, you would write:

<KEYWORDS IGNORE_CASE="FALSE">
    <KEYWORD1>if</KEYWORD1>
    <KEYWORD1>else</KEYWORD1>
    <KEYWORD3>int</KEYWORD3>
    <KEYWORD3>char</KEYWORD3>
</KEYWORDS>

Token Types

Each syntax token is of one of the following types:

There are no formal conventions specifying which token types should be used for what; instead, just take a look at how syntax is highlighted in some existing modes and decide for yourself what token type you should use.