edu.umass.cs.mallet.base.pipe
Class SimpleTaggerSentence2TokenSequence

java.lang.Object
  extended byedu.umass.cs.mallet.base.pipe.Pipe
      extended byedu.umass.cs.mallet.base.pipe.SimpleTaggerSentence2TokenSequence
All Implemented Interfaces:
java.io.Serializable

public class SimpleTaggerSentence2TokenSequence
extends Pipe

Converts an external encoding of a sequence of elements with binary features to a TokenSequence. If target processing is on (training or labeled test data), it extracts element labels from the external encoding to create a target LabelSequence. Two external encodings are supported:

  1. A String containing lines of whitespace-separated tokens.
  2. a String[][].

Both represent rows of tokens. When target processing is on, the last token in each row is the label of the sequence element represented by this row. All other tokens in the row, or all tokens in the row if not target processing, are the names of features that are on for the sequence element described by the row.

See Also:
Serialized Form

Constructor Summary
SimpleTaggerSentence2TokenSequence()
          Creates a new SimpleTaggerSentence2TokenSequence instance.
SimpleTaggerSentence2TokenSequence(boolean inc)
          creates a new SimpleTaggerSentence2TokenSequence instance which includes tokens as features iff the supplied argument is true.
 
Method Summary
 Instance pipe(Instance carrier)
          Takes an instance with data of type String or String[][] and creates an Instance of type TokenSequence.
 
Methods inherited from class edu.umass.cs.mallet.base.pipe.Pipe
getDataAlphabet, getInstanceId, getParent, getParentRoot, getTargetAlphabet, isDataAlphabetSet, isTargetProcessing, pipe, readResolve, resolveDataAlphabet, resolveTargetAlphabet, setDataAlphabet, setParent, setTargetAlphabet, setTargetProcessing
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleTaggerSentence2TokenSequence

public SimpleTaggerSentence2TokenSequence()
Creates a new SimpleTaggerSentence2TokenSequence instance. By default we include tokens as features.


SimpleTaggerSentence2TokenSequence

public SimpleTaggerSentence2TokenSequence(boolean inc)
creates a new SimpleTaggerSentence2TokenSequence instance which includes tokens as features iff the supplied argument is true.

Method Detail

pipe

public Instance pipe(Instance carrier)
Takes an instance with data of type String or String[][] and creates an Instance of type TokenSequence. Each Token in the sequence is gets the test of the line preceding it and once feature of value 1 for each "Feature" in the line. For example, if the String[][] is {{a,b},{c,d,e}} (and target processing is off) then the text would be "a b" for the first token and "c d e" for the second. Also, the features "a" and "b" would be set for the first token and "c", "d" and "e" for the second. The last element in the String[] for the current token is taken as the target (label), so in the previous example "b" would have been the label of the first sequence.

Specified by:
pipe in class Pipe
Parameters:
carrier - Instance to be processed.