edu.umass.cs.mallet.base.pipe
Class TokenSequenceMatchDataAndTarget

java.lang.Object
  extended byedu.umass.cs.mallet.base.pipe.Pipe
      extended byedu.umass.cs.mallet.base.pipe.TokenSequenceMatchDataAndTarget
All Implemented Interfaces:
java.io.Serializable

public class TokenSequenceMatchDataAndTarget
extends Pipe
implements java.io.Serializable

Run a regular expression over the text of each token; replace the text with the substring matching one regex group; create a target TokenSequence from the text matching another regex group.

For example, if you have a data file containing one line per token, and the label also appears on that line, you can first get a TokenSequence in which the text of each line is the Token.getText() of each token, then run this pipe, and separate the target information from the data information. For example to process the following,

         BACKGROUND Then
         PERSON Mr.
         PERSON Smith
         BACKGROUND said
         ...
         
use new TokenSequenceMatchDataAndTarget (Pattern.compile ("([A-Z]+) (.*)"), 2, 1).

See Also:
Serialized Form

Constructor Summary
TokenSequenceMatchDataAndTarget(java.util.regex.Pattern regex, int dataGroup, int targetGroup)
           
TokenSequenceMatchDataAndTarget(java.lang.String regex, int dataGroup, int targetGroup)
           
 
Method Summary
 Instance pipe(Instance carrier)
          Process an Instance.
 
Methods inherited from class edu.umass.cs.mallet.base.pipe.Pipe
getDataAlphabet, getInstanceId, getParent, getParentRoot, getTargetAlphabet, isDataAlphabetSet, isTargetProcessing, pipe, readResolve, resolveDataAlphabet, resolveTargetAlphabet, setDataAlphabet, setParent, setTargetAlphabet, setTargetProcessing
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TokenSequenceMatchDataAndTarget

public TokenSequenceMatchDataAndTarget(java.util.regex.Pattern regex,
                                       int dataGroup,
                                       int targetGroup)

TokenSequenceMatchDataAndTarget

public TokenSequenceMatchDataAndTarget(java.lang.String regex,
                                       int dataGroup,
                                       int targetGroup)
Method Detail

pipe

public Instance pipe(Instance carrier)
Description copied from class: Pipe
Process an Instance. This method takes an input Instance, destructively modifies it in some way, and returns it. This is the method by which all pipes are eventually run.

One can create a new concrete subclass of Pipe simply by implementing this method.

Specified by:
pipe in class Pipe
Parameters:
carrier - Instance to be processed.