Class | Ferret::Analysis::AsciiLetterTokenizer |
In: |
ext/r_analysis.c
|
Parent: | Ferret::Analysis::TokenStream |
A LetterTokenizer is a tokenizer that divides text at non-ASCII letters. That is to say, it defines tokens as maximal strings of adjacent letters, as defined by the regular expression _/[A-Za-z]+/_.
"Dave's résumé, at http://www.davebalmain.com/ 1234" => ["Dave", "s", "r", "sum", "at", "http", "www", "davebalmain", "com"]