Metacharacters are "magical" characters which are interpreted in special ways when encountered by a program in it's input. Many different UNIX programs implement metacharacters, and the concept is one the most useful and powerful features of a UNIX system.
However, like any powerful tool, unless you know what you're doing you can have some very unpleasant experiences with metacharacters. So unless you know exactly what you're doing, it's wise to avoid using metacharacters in filenames and other data used in maketool. Here be dragons!
Usually, metacharacters are the odd punctuation characters around the edge of your keyboard, such as $ * or &. If you stick to using letters, numbers, and the / (slash) - (dash) . (dot) and _ (underscore) characters, you should be fine.
If you're still reading this, you either know what you're doing, or you want to know. To discourage you further, here are some of the reasons why metacharacters can cause unwanted consenquences and make debugging painful.
Many different UNIX programs (e.g. the shell, gmake, sed, m4, awk and the C compiler) interpret metacharacters.
Sometimes it's not easy to tell which programs are going to be involved in processing data, especially if you use someone else's complex shell script.
All programs have a quoting convention which enables you to quote metacharacters, usually by proceeding the metacharacter with a \ (backslash) character. However, the shell has three quoting conventions, which can interact.
Scripts for one program can call other programs (e.g. a shell script calls sed), or build other scripts and then call them. To work properly in the presence of metacharacters, the calling script needs to be aware of the other program's quoting convention; many scripts don't do this.
The sets of metacharacters implemented by the various programs overlap, but only partially. For example, ${HOME} has the same meaning in the shell and gmake but $(HOME) does not.
A similar situation exists with whitespace (space, tab and newline) characters, which are interpreted specially in the shell and gmake.
If you're still reading this, there is some hope. Sometimes you can achieve what you want with intelligent use of backslash quoting; the trick is knowing how many backquotes you need. Sometimes a combination of backslash and double-quote or single-quote quoting works. Unfortunately the consenquences of too much or too little quoting can be as bad as the problems you were trying to avoid in the first place. Really, the best course is to simply avoid using metacharacters and whitespace when naming files or directories.
If you're still not convinced, here's a quick summary of metacharacter-related failure modes when using the autoconf system.
First, a quick summary of how autoconf works. The developer uses the autoconf program, which runs m4 to process the file configure.in into a shell script named configure, which is then distributed with the application source. The builder runs the configure script, which amongst many other things uses awk and sed to process input data. As it's final step, configure creates two new shell scripts named config.cache and config.status and runs config.status. The next time configure is run, it runs config.cache as it's first step. The config.status script uses sed to create various other files from input templates, usually Makefile from Makefile.in and config.h from config.h.in. Then the builder uses gmake, which uses the information in Makefile to construct and run shell commands to compile and link the source code. Compilation usually uses information in config.h.
At each and every one of these steps, stray metacharacters or whitespace can cause damage ranging from the build process failing with an error, to more dangerous and subtle problems like new files being installed into incorrect directories or over precious system files.
Here's a summary of how each character can cause failures in the autoconf chain (for clarity, starting at the step where you run the configure script).
Table 2. Metacharacter-related failure modes of autoconf
Metacharacter(s) | Failure modes |
---|---|
(space) | config.site, shell, gmake runtime |
(tab) | config.site, shell, gmake runtime |
( ) (parentheses) | shell |
* (asterisk) | shell |
? (question mark) | shell |
[ ] (square brackets) | shell |
$ (dollar sign) | gmake runtime, gmake parsetime |
; (semicolon) | gmake parsetime |
" (double quote) | C strings, shell |
' (single quote) | balance, lost, shell, balance |
\ (backslash) | C strings, shell |
# (hash mark) | gmake runtime |
! (exclamation mark) | interactive |
& (ampersand) | shell |
| (vertical bar) | shell |
` (backquote) | balance, shell |
< > (less, greater) | shell |
% (percent sign) | gmake runtime, gmake parsetime |
Index to failure modes
The configure script finishes correctly but undesired whitespace expansion causes confusing error messages when configure tries to load the site configuration file config.site using a path relative to the directory specified with --prefix.
The configure script works but gmake will probably fail due to unexpected expansion by the shell of metacharacters or whitespace in commands given to the shell by gmake. This may be solved by adding quoting.
The configure script works but gmake will probably fail due to unexpected expansion by gmake of metacharacters or whitespace in commands given to the shell by gmake. This may be solved by adding quoting.
The configure script works but gmake will probably fail due to parse errors in Makefile caused by expansion by gmake of variables whose values contain metacharacters or whitespace. This may be solved by adding quoting. The error is typically:
Makefile:10: *** missing separator. Stop.
The configure script works but the C compiler will generate syntax errors if metacharacters are used in a string literal in config.h or other C code.
The configure script will fail with shell syntax error if the quotes are unbalanced, i.e. do not occur in pairs.
The configure script works but the metacharacter is lost from values in created files such as Makefile. This may be solved by adding quoting.
The autoconf system works fine but the metacharacter needs to be quoted when used in an interactive shell session, for example if you use ls to look in a directory whose name contains the metacharacter.
If you're still not frightened of metacharacters, go buy a good book on the UNIX shell and read it. Alternatively, the Advanced Bash-Scripting Guide is available online. You will learn a powerful and useful tool.