Metacharacters

Metacharacters are "magical" characters which are interpreted in special ways when encountered by a program in it's input. Many different UNIX programs implement metacharacters, and the concept is one the most useful and powerful features of a UNIX system.

However, like any powerful tool, unless you know what you're doing you can have some very unpleasant experiences with metacharacters. So unless you know exactly what you're doing, it's wise to avoid using metacharacters in filenames and other data used in maketool. Here be dragons!

Usually, metacharacters are the odd punctuation characters around the edge of your keyboard, such as $ * or &. If you stick to using letters, numbers, and the / (slash) - (dash) . (dot) and _ (underscore) characters, you should be fine.

If you're still reading this, you either know what you're doing, or you want to know. To discourage you further, here are some of the reasons why metacharacters can cause unwanted consenquences and make debugging painful.

If you're still reading this, there is some hope. Sometimes you can achieve what you want with intelligent use of backslash quoting; the trick is knowing how many backquotes you need. Sometimes a combination of backslash and double-quote or single-quote quoting works. Unfortunately the consenquences of too much or too little quoting can be as bad as the problems you were trying to avoid in the first place. Really, the best course is to simply avoid using metacharacters and whitespace when naming files or directories.

If you're still not convinced, here's a quick summary of metacharacter-related failure modes when using the autoconf system.

First, a quick summary of how autoconf works. The developer uses the autoconf program, which runs m4 to process the file configure.in into a shell script named configure, which is then distributed with the application source. The builder runs the configure script, which amongst many other things uses awk and sed to process input data. As it's final step, configure creates two new shell scripts named config.cache and config.status and runs config.status. The next time configure is run, it runs config.cache as it's first step. The config.status script uses sed to create various other files from input templates, usually Makefile from Makefile.in and config.h from config.h.in. Then the builder uses gmake, which uses the information in Makefile to construct and run shell commands to compile and link the source code. Compilation usually uses information in config.h.

At each and every one of these steps, stray metacharacters or whitespace can cause damage ranging from the build process failing with an error, to more dangerous and subtle problems like new files being installed into incorrect directories or over precious system files.

Here's a summary of how each character can cause failures in the autoconf chain (for clarity, starting at the step where you run the configure script).

Table 2. Metacharacter-related failure modes of autoconf

Metacharacter(s)Failure modes
(space) config.site, shell, gmake runtime
(tab) config.site, shell, gmake runtime
( ) (parentheses) shell
* (asterisk) shell
? (question mark) shell
[ ] (square brackets) shell
$ (dollar sign) gmake runtime, gmake parsetime
; (semicolon) gmake parsetime
" (double quote) C strings, shell
' (single quote) balance, lost, shell, balance
\ (backslash) C strings, shell
# (hash mark) gmake runtime
! (exclamation mark) interactive
& (ampersand) shell
| (vertical bar) shell
` (backquote) balance, shell
< > (less, greater) shell
% (percent sign) gmake runtime, gmake parsetime

Index to failure modes

config.site

The configure script finishes correctly but undesired whitespace expansion causes confusing error messages when configure tries to load the site configuration file config.site using a path relative to the directory specified with --prefix.

shell

The configure script works but gmake will probably fail due to unexpected expansion by the shell of metacharacters or whitespace in commands given to the shell by gmake. This may be solved by adding quoting.

gmake runtime

The configure script works but gmake will probably fail due to unexpected expansion by gmake of metacharacters or whitespace in commands given to the shell by gmake. This may be solved by adding quoting.

gmake parsetime

The configure script works but gmake will probably fail due to parse errors in Makefile caused by expansion by gmake of variables whose values contain metacharacters or whitespace. This may be solved by adding quoting. The error is typically:

Makefile:10: *** missing separator.  Stop.
      
C strings

The configure script works but the C compiler will generate syntax errors if metacharacters are used in a string literal in config.h or other C code.

balance

The configure script will fail with shell syntax error if the quotes are unbalanced, i.e. do not occur in pairs.

lost

The configure script works but the metacharacter is lost from values in created files such as Makefile. This may be solved by adding quoting.

interactive

The autoconf system works fine but the metacharacter needs to be quoted when used in an interactive shell session, for example if you use ls to look in a directory whose name contains the metacharacter.

If you're still not frightened of metacharacters, go buy a good book on the UNIX shell and read it. Alternatively, the Advanced Bash-Scripting Guide is available online. You will learn a powerful and useful tool.