Class | Ferret::Analysis::WhiteSpaceTokenizer |
In: |
ext/r_analysis.c
|
Parent: | Ferret::Analysis::TokenStream |
A WhiteSpaceTokenizer is a tokenizer that divides text at white-space. Adjacent sequences of non-WhiteSpace characters form tokens.
"Dave's résumé, at http://www.davebalmain.com/ 1234" => ["Dave's", "résumé,", "at", "http://www.davebalmain.com", "1234"]
Create a new WhiteSpaceTokenizer which optionally downcases tokens. Downcasing is done according the the current locale.
lower: | set to false if you don‘t wish to downcase tokens |