dbXML API

com.dbxml.db.common.fulltext
Interface WordStemmer

All Known Implementing Classes:
PorterStemmer

public interface WordStemmer

WordStemmer is an interface for defining stemmers. Stemmers are typically used for western languages that support the concept of prefix, suffix, and tense. By default, dbXML uses a Porter Stemmer, which is specifically for the English language, but it should be fairly trivial to develop a Stemmer for non-English languages.


Method Summary
 java.lang.String normalizeCase(java.lang.String word)
          normalizeCase normalizes the case of the specific word.
 java.lang.String stemWord(java.lang.String word)
          stemWord stems the specified word.
 

Method Detail

normalizeCase

public java.lang.String normalizeCase(java.lang.String word)
normalizeCase normalizes the case of the specific word. Case normalization is language-specific, as is stemming, so it made sense to tie the two functions into one interface. By default dbXML normalizes to lower-case.

Parameters:
word - The word to normalize
Returns:
The normalized word

stemWord

public java.lang.String stemWord(java.lang.String word)
stemWord stems the specified word.

Parameters:
word - The word to stem
Returns:
The stemmed word

dbXML API

Copyright (c) 2004 The dbXML Group