edu.umass.cs.mallet.base.extract
Interface Tokenization

All Superinterfaces:
Sequence
All Known Implementing Classes:
StringTokenization

public interface Tokenization
extends Sequence


Method Summary
 java.lang.Object getDocument()
          Returns the document of which this is a tokenization.
 Span getSpan(int i)
           
 Span subspan(int start, int end)
          Returns a span formed by concatenating the spans from start to end.
 
Methods inherited from interface edu.umass.cs.mallet.base.types.Sequence
get, size
 

Method Detail

getDocument

public java.lang.Object getDocument()
Returns the document of which this is a tokenization.

Returns:

getSpan

public Span getSpan(int i)

subspan

public Span subspan(int start,
                    int end)
Returns a span formed by concatenating the spans from start to end. In more detail:

Parameters:
start - The index of the first token in the new span (inclusive). This is an index of a token, *not* an index into the document.
end - The index of the first token in the new span (exclusive). This is an index of a token, *not* an index into the document.
Returns:
A span into this tokenization's document