Netmond V2. Regular expression (regex)

The Netmond built-in regex is a regular expression pattern matching and replacement language. It writen in C.

These routines are the equivalents of regex routines as found in 4.nBSD UN*X, with minor extensions. These routines are derived from various implementations found in software tools books, and Conroy's grep. They are NOT derived from licensed/restricted software.
For more interesting/academic/complicated implementations, see Henry Spencer's regexp routines, or GNU Emacs pattern matching module.

Regular Expressions:

[1] char
Matches itself, unless it is a special character (metachar): . \ [ ] * + ^ $ ( )
[2] .
Matches any character.
[3] \
Matches the character following it, except when followed by a left or right round bracket, a digit 1 to 9 or a left or right angle bracket. (see [7], [8] and [9])
It is used as an escape character for all other meta-characters, and itself. When used in a set ([4]), it is treated as an ordinary character.
[4] [set]
Matches one of the characters in the set. If the first character in the set is "^", it matches a character NOT in the set. A shorthand S-E is used to specify a set of characters S upto E, inclusive. The special characters "]" and "-" have no special meaning if they appear as the first chars in the set. Examples:
[a-z]
Match any lowercase alpha.
[^]-]
Match any char except "]" and "-".
[^A-Z]
Match any char except uppercase alpha.
[a-zA-Z]
Match any alpha.
[5] *
Any regular expression form [1] to [4], followed by closure char "*" matches zero or more matches of that form.
[6] +
Same as [5], except it matches one or more.
[7] (tag)
A regular expression in the form [1] to [10], enclosed as (form) matches what form matches. The enclosure creates a set of tags, used for [8] and for pattern substution. The tagged forms are numbered starting from 1.
[8] \digit
A \ followed by a digit 1 to 9 matches whatever a previously tagged regular expression ([7]) matched.
[9] \<word\>
A regular expression starting with a \< construct and/or ending with a \> construct, restricts the pattern matching to the beginning of a word, and/or the end of a word. A word is defined to be a character string beginning and/or ending with the characters A-Z a-z 0-9 and _. It must also be preceded and/or followed by any character outside those mentioned.
[10] xy
A composite regular expression xy where x and y are in the form [1] to [10] matches the longest match of x followed by a match for y.
[11] ^exact$
A regular expression starting with a "^" character and/or ending with a "$" character, restricts the pattern matching to the beginning of the line, or the end of line. [anchors] Elsewhere in the pattern, "^" and "$" are treated as ordinary characters.

Authors:
Writen in C by Ozan S. Yigit (oz), Dept. of Computer Science, York University.
ANSI prototypes and regex.h added by Mark Russell, UKC.

Acknowledgements:
HCR's Hugh Redelmeier has been most helpful in various stages of development. He convinced me to include BOW and EOW constructs, originally invented by Rob Pike at the University of Toronto.

References:
Software tools in Pascal, Kernighan & Plauger.
Grep [rsx-11 C dist], David Conroy.
ed - text editor, Un*x Programmer's Manual.
Advanced editing on Un*x, B. W. Kernighan.
RegExp routines, Henry Spencer.

Notes:
This implementation uses a bit-set representation for character classes for speed and compactness. Each character is represented by one bit in a 128-bit block. Thus, CCL or NCL always takes a constant 16 bytes in the internal dfa, and yre_exec does a single bit comparison to locate the character in the set.

Examples:
PatternMatches

foo*.*fo foo fooo foobar fobar foxx ...
fo[ob]a[rz] fobar fooar fobaz fooaz
foo\\+ foo\ foo\\ foo\\\ ...
(foo)[1-3]\1 (same as foo[1-3]foo) foo1foo foo2foo foo3foo
(fo.*)-\1 foo-foo fo-fo fob-fob foobar-foobar ...

© 1998-2002, Rinet Software