edu.umass.cs.mallet.base.pipe
Class SimpleTaggerSentence2TokenSequence
java.lang.Object
edu.umass.cs.mallet.base.pipe.Pipe
edu.umass.cs.mallet.base.pipe.SimpleTaggerSentence2TokenSequence
- All Implemented Interfaces:
- java.io.Serializable
- public class SimpleTaggerSentence2TokenSequence
- extends Pipe
Converts an external encoding of a sequence of elements with binary
features to a TokenSequence
. If target processing
is on (training or labeled test data), it extracts element labels
from the external encoding to create a target LabelSequence
.
Two external encodings are supported:
- A
String
containing lines of whitespace-separated tokens.
- a
String
[][]
.
Both represent rows of tokens. When target processing is on, the last token
in each row is the label of the sequence element represented by
this row. All other tokens in the row, or all tokens in the row if
not target processing, are the names of features that are on for
the sequence element described by the row.
- See Also:
- Serialized Form
Method Summary |
Instance |
pipe(Instance carrier)
Takes an instance with data of type String or String[][] and creates
an Instance of type TokenSequence. |
Methods inherited from class edu.umass.cs.mallet.base.pipe.Pipe |
getDataAlphabet, getInstanceId, getParent, getParentRoot, getTargetAlphabet, isDataAlphabetSet, isTargetProcessing, pipe, readResolve, resolveDataAlphabet, resolveTargetAlphabet, setDataAlphabet, setParent, setTargetAlphabet, setTargetProcessing |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SimpleTaggerSentence2TokenSequence
public SimpleTaggerSentence2TokenSequence()
- Creates a new
SimpleTaggerSentence2TokenSequence
instance.
By default we include tokens as features.
SimpleTaggerSentence2TokenSequence
public SimpleTaggerSentence2TokenSequence(boolean inc)
- creates a new
SimpleTaggerSentence2TokenSequence
instance
which includes tokens as features iff the supplied argument is true.
pipe
public Instance pipe(Instance carrier)
- Takes an instance with data of type String or String[][] and creates
an Instance of type TokenSequence. Each Token in the sequence is
gets the test of the line preceding it and once feature of value 1
for each "Feature" in the line. For example, if the String[][] is
{{a,b},{c,d,e}} (and target processing is off) then the text would be
"a b" for the first token and "c d e" for the second. Also, the
features "a" and "b" would be set for the first token and "c", "d" and
"e" for the second. The last element in the String[] for the current
token is taken as the target (label), so in the previous example "b"
would have been the label of the first sequence.
- Specified by:
pipe
in class Pipe
- Parameters:
carrier
- Instance to be processed.