next up previous contents
Next: 4. Status Up: Portable Spell Checker Interface Previous: 2. Getting Started   Contents

Subsections

3. Library Interface

3.1 Overview

The Pspell library contains two main classes and several helper classes. The two main classes are PspellConfig and PspellMaster. The PspellConfig class is used to set initial defaults and to change spell checker specific options. The PspellManager class does most of the real work. It is responsible for managing the dictionaries, checking if a word is in the dictionary, and coming up with suggestions among other things. There are many helper classes the important ones are PspellWordList, PspellMutableWordList, Pspell*Emulation. The PspellWordList classes is used for accessing the suggestion list, as well as the personal and suggestion word list currently in use. The PspellMutableWordList is used to manage the personal, and perhaps other, word lists. The Pspell*Emulation classes are used for iterating through a list.

A C interface will also be proved as well as a few STL like helper classes for those who prefer more modern C++.

3.2 Usage

To use pspell your application should include ``pspell/pspell.h''. In order to insure that all the necessary libraries are linked in libtool should be used to perform the linking. When using libtool simply linking with ``-lpspell'' should be all that is necessary. When using shared libraries you might be able to simply link ``-lpspell'', but this is not recommended. This version of Pspell uses the CVS version of libtool (HEAD branch) however released versions of libtool should also work.

When your application first starts you should get a new configuration class with the command:

PspellConfig * spell_config = new_pspell_config();
which will create a new PspellConfig class. It is allocated with new and it is your responsibility to delete it with delete_pspell_config or the standard C++ delete. Once you have the config class you should set some variables. The most important one is the language variable. To do so use the command:

spell_config->replace("language-tag", "en");
which will set the default language to use to American English. The language is expected to be the standard two letter ISO 639 language code, with an optional two letter ISO 3166 country code after a dash or underscore. You can set the preferred spelling via the ``spelling'' option, any extra info via the ``jargon'' option, and the encoding via the ``encoding'' option. Other things you might want to set is the preferred spell checker to use, the search path for dictionary's, and the like see section 3.6 for the available options.

When ever a new document is created a new PspellManager class should also be created. There should be one manager class per document. To create a new manager class use the command.

PspellManager * spell_checker = new_pspell_manager(spell_config);
which will create a new PspellManager class using the defaults found in spell_config. If for some reason you want to use different defaults simply clone spell_config and change the setting like so:

PspellConfig * spell_config2 = spell_config->clone();
spell_config2->replace("language-tag","nl");
PspellManager * spell_checker = new_pspell_manager(spell_config2);
delete_pspell_config(spell_config2);
Once the manager class is created you can use the check method to see if a word in the document is correct like so:

bool correct = spell_checker->check(<word>);
<word> can is expected to a const char * character string. If the encoding is set to be ``machine unsigned 16'' or ``machine unsigned 32''. <word> is expected to be a direction cast from either const u16int * or const u32int* respectfully. U16int and u32int are generally unsigned short and unsigned int respectfully.

If the word is not correct than the suggest method can be used to come up with likely replacements.

PspellWordList & suggestions = suggest(<word>); 
PspellStringEmulation * elements = suggestions.elements();
const char * word;
while ( (word = elements.next()) != NULL ) {
  // add to suggestion list
}
delete elements;
Once a replacement is made the store_repl method should be used to communicate the replacement pair back to the spell checker (see section 3.9.1 for why). It usage is as follows:

spell_checker->store_repl(<misspelled word>, <correctly spelled word>);
If the user decided to add the word to the session or personal dictionary the the word can be be added using the add_to_session or add_to_personal methods respectfully like so:

spell_checker->add_to_session|personal(<word>);
It is better to let the spell checker manage these words rather than doing it your self so that the words have a change of appearing in the suggestion list.

Finally, when the document is closed the PspellManager class should be deleted like so.

delete_pspell_manager(spell_checker);
The standard C++ delete should NOT be used here because it will not unload any shared libraries pulled in my when the manager class is created.

3.3 Class Reference

Methods that return a bool generally return false on error and true other wise. To find out what went wrong use the error_number and error_message methods. Unless otherwise stated methods that return a const char * will return null on error. The charter string returned is only valid until the next method which returns a const char * is called.

All methods are virtual and abstract, thus these classes are really abstract base classes. Therefore you cannot simply store the object directly. In order to make copies of the objects use the clone and assign methods if they are provided.

For the details of the various classes please see the header files. In the future I will generate class references using some automated tool.

3.4 C Interface

An ``extrern C'' interface is also be provided. Method will be mapped to functions in the following manner.

<class name in lowercase with underscores>_<method name>([const] <Class> *, <other parameters if any>)
For example ``PspellManager::lang_name() const'' would become ``pspell_manager_lang_name(const PspellManager *)''.

Methods that return a bool will instead return an int in the C interface.

3.5 Modern C++ Helper Classes

An almost forward iterator class will be proved in a future version to wrap the Pspell*Emulation classes in. It is almost a forward iterator because two iterators will not be able to compared two each other unless it is to check if the iterator is at the end.

I strongly recommend the use of auto_ptr with all pointers returned. All pointers returned that you are reasonable to free, except for PspellManager, will be able to deleted with the standard C++ delete.

These helper classes will provided in separate header files so those who do not which to use them will not have to.


3.6 Available Options

The following options are available to control which word list Pspell selects.

language-tag <string>
the language code which consists of the two letter ISO 639 language code and an optional two letter ISO 3166 country code after a dash or underscore.
spelling <string>
the requested spelling for languages with more than one spelling such as English. Known values are ``american'', ``britsh'', and ``canadian''.
jargon <string>
an extra information two distinguish two different words lists that have the same language-tag and spelling.
word-list-path <list>
search path for word list information files
module-search-order <list>
list of available modules, modules that come first on this list have a higher priority
The following options control the behavior of the selected module. Not all modules support all options.

encoding <string>
encoding that words are expected to be in. Valid values are ``utf-8'', ``iso8859-*'', ``koi8-r'', ``viscii'', ``cp1252'', ``machine unsigned 16'', ``machine unsigned 32''.
ignore <int>
ignore all words which are not at least as long as the value for this setting
sug-mode <string>
the suggestion mode, known values are fast, normal, and bad-spellers
run-together <boolean>
consider run-together words as legal compounds.
The following options may be examined to tell exactly what word list or module was selected

master
the full path of the word list selected
master-flags
any special flags that were passed on to the module
module
the module selected
The options, spelling and jargon can also be examined.

3.7 Format of the PWLI Files

In order for Pspell to know which word lists to use each word list must have at least one *.pwli file in the pspell data directory which is normally /usr/local/share/pspell/, use ``pspell-config pkgdatadir'' to find out what it is on your system.

Each *.pwli has the the following name:

<language>[-[<spelling>][-<jargon>]]-<module>.pwli
Where <language> is the two letter language code, <spelling> is the particular spelling your interested in if the languages has multiple spelling in different parts of the world such as English, <jargon> is any extra informations to distinguish the word list from other ones with the same language and spelling, and <module> is the pspell module the main word list is for.

For example:

en-aspell.pwli
en-american-aspell.pwli
en-american-medical-ispell.pwli
en-american-xlg-ispell.pwli
de--medical-ispell.pwli
Notice how if the spelling is left out but the jargon is not there needs to be two dashes between the language and the jargon.

Each *.pwli file then contains exactly one line which contains the full path of the main word list, white space, then any additional options to pass onto the module.

3.8 Examples

Two simple examples are included in the examples directory. Pspell must be installed before they will compile and at least one pspell module must be installed before they will run. To build the C++ example type ``make example-cxx'' and to build the C examples type ``make example-c''.

3.9 Rational


3.9.1 store_repl method

This method is needed because Aspell (http://aspell.sourceforge.net/) is able to learn from users misspellings. For example on the first pass a user misspells beginning as beging so aspell suggests:

begging, begin, being, Beijing, bagging, ....
However the user then tries "begning" and aspell suggests

beginning, beaning, begging, ...
so the user selects beginning. However than, latter on in the document the user misspelles it as begng (NOT beging). Normally aspell will suggest.

began, begging, begin, begun, ....
However becuase it knows the user mispelled beginning as beging it will instead suggest:

beginning, began, begging, begin, begun ...
I myself often misspelled beginning (and still do) as something close to begging and two many times wind up writing sentences such as "begging with ....".


next up previous contents
Next: 4. Status Up: Portable Spell Checker Interface Previous: 2. Getting Started   Contents
Kevin Atkinson 2000-06-11