edu.umass.cs.mallet.base.extract
Class StringTokenization

java.lang.Object
  extended byedu.umass.cs.mallet.base.types.TokenSequence
      extended byedu.umass.cs.mallet.base.extract.StringTokenization
All Implemented Interfaces:
PipeOutputAccumulator, Sequence, java.io.Serializable, Tokenization

public class StringTokenization
extends TokenSequence
implements Tokenization

See Also:
Serialized Form

Constructor Summary
StringTokenization(java.lang.CharSequence seq)
          Create an empty StringTokenization
StringTokenization(java.lang.CharSequence string, CharSequenceLexer lexer)
          Creates a tokenization of the given string.
 
Method Summary
 java.lang.Object getDocument()
          Returns the document of which this is a tokenization.
 Span getSpan(int i)
           
 Span subspan(int firstToken, int lastToken)
          Returns a span formed by concatenating the spans from start to end.
 
Methods inherited from class edu.umass.cs.mallet.base.types.TokenSequence
add, add, addAll, addAll, addAll, clonePipeOutputAccumulator, get, getNumericProperty, getProperty, getToken, hasProperty, iterator, pipeOutputAccumulate, remove, removeLastToken, setNumericProperty, setProperty, size, toFeatureSequence, toFeatureVector, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface edu.umass.cs.mallet.base.types.Sequence
get, size
 

Constructor Detail

StringTokenization

public StringTokenization(java.lang.CharSequence seq)
Create an empty StringTokenization


StringTokenization

public StringTokenization(java.lang.CharSequence string,
                          CharSequenceLexer lexer)
Creates a tokenization of the given string. Tokens are added from all the matches of the given lexer.

Method Detail

subspan

public Span subspan(int firstToken,
                    int lastToken)
Description copied from interface: Tokenization
Returns a span formed by concatenating the spans from start to end. In more detail:

Specified by:
subspan in interface Tokenization
Parameters:
firstToken - The index of the first token in the new span (inclusive). This is an index of a token, *not* an index into the document.
lastToken - The index of the first token in the new span (exclusive). This is an index of a token, *not* an index into the document.
Returns:
A span into this tokenization's document

getSpan

public Span getSpan(int i)
Specified by:
getSpan in interface Tokenization

getDocument

public java.lang.Object getDocument()
Description copied from interface: Tokenization
Returns the document of which this is a tokenization.

Specified by:
getDocument in interface Tokenization
Returns: