Next Previous Contents

5. The Zebra Configuration File

As mentioned, the Zebra indexer and server always look for the file zebra.cfg in their current working directory (unless they are told to look for it elsewhere with the -c option). The example file in the test directory represents all but the bare minimum for such a file. We find the following to be a powerful setup for a GILS-like database (everything preceded by (#) is ignored by the software):

#
# Sample configuration file for GILS database
#

# Where are the configuration files located?
profilePath: /usr/local/lib/zebra

# Load attribute sets for searching
attset bib1.att
attset gils.att

# Records are identified by their path in the file system
recordId: file

# Store information about records to allow deletion and updating
storeKeys: 1

# Records are structured
recordType: grs

# Where to store the indexes
register: /datadisk/index:500M

# Where to store temporary data while merging with register
shadow: /datadisk/shadow:500M

If you like, you can paste this file straight into a zebra.cfg file ready for your own use (with a bit of editing of the pathnames). In the following, we'll explain the individual settings. For the full story on the zebra.cfg file and the configuration options of Zebra, you should read the general documentation.

profilePath

This field tells Zebra where to look for the configuration files. In the distribution, these files are located in the tab directory, but you may wish to put them someplace else for convenience. If necessary, you can provide multiple directory paths, separated by (:).

attset

This field tells the Zebra server which attribute sets it should support for searching. You could get by with just loading the GILS set, but if you load BIB-1 as well, Zebra will support both sets for those GILS attributes that are inherited from BIB-1.

recordId

The recordId: file setting tells Zebra that individual records should be identified by the physical files in which they are located. In this mode, your database will always (after an update operation) reflect the contents of the directory (or directories).

storeKeys

This setting tells Zebra to store additional information about each record, to facilitate updating. In combination with the recordId: file setting, this is a very convenient maintenance option. If you maintain your records as individual files in a directory tree, you have only to run zebraidx with the top-level directory as an argument. If new files are added, they are entered into the database. If they are modified, the indexes are changed accordingly, and if they are deleted from the filesystem (or renamed), the indexes are also updated correctly, the next time you run zebraidx.

recordType

This setting selects the type of processing which is to take place when a record is accessed by the indexer or the Z39.50 server. GRS stands for Generic Record Syntax, and signals that the records are structured.

register

In the first test above, you may have noticed that the zebraidx created a number of files in the working directory. Some of these files, which contain the indexing information for the database, can grow quite large, and it is sometimes useful to place them in a separate directory or file system. You should provide the path of the directory followed by a colon (:), followed by the maximum amounts of megabytes (M) or kilobytes (K) of disk space that Zebra is allowed to use in the given directory. If you specify more than one directory:size combination on the same line, Zebra will fill up each directory from left to right. This feature is essential if your database is so large that the registers cannot fit into a single partition of your disk.

shadow

The format of this setting is the same as for the one above. If you provide one or more directory for the "shadow system", you enable the safe updating system of the Zebra indexer. When changes to the records are merged into the register files, the files are not changed immediately. Instead, the changes are written into separate files, or "shadow files". At the end of the merging process, or in a separate operation, the changes are "committed", and written into the register files themselves. This final step is carried out by the command zebraidx commit - the commit directive can also be given on the same command line as the update directive - at the end of the command line. The shadow file system can consume a lot of disk space - particularly in a large update operation which involves almost all of the index, but the benefits are substantial. If the system crashes during an update procedure, or the process is otherwise interrupted, the registers are left in an unknown state, and are effectively rendered useless - this can be unfortunate if the index is very large, but the use of the shadow system greatly reduces the risk of an index being damaged in this way. Further, when the shadow system is enabled, your clients may access the Zebra server without interruption throughout the update and commit procedures - Zebra will ensure that the parts of the register accessed by the server are always consistent.


Next Previous Contents