com.lucene.analysis
Class LetterTokenizer

java.lang.Object
  |
  +--com.lucene.analysis.TokenStream
        |
        +--com.lucene.analysis.Tokenizer
              |
              +--com.lucene.analysis.LetterTokenizer

public final class LetterTokenizer
extends Tokenizer

A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.


Fields inherited from class com.lucene.analysis.Tokenizer
input
 
Constructor Summary
LetterTokenizer(Reader in)
           
 
Method Summary
 Token next()
          Returns the next token in the stream, or null at EOS.
 
Methods inherited from class com.lucene.analysis.Tokenizer
close
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LetterTokenizer

public LetterTokenizer(Reader in)
Method Detail

next

public final Token next()
                 throws IOException
Description copied from class: TokenStream
Returns the next token in the stream, or null at EOS.
Overrides:
next in class TokenStream