edu.umass.cs.mallet.base.pipe
Class SGML2TokenSequence
java.lang.Object
edu.umass.cs.mallet.base.pipe.Pipe
edu.umass.cs.mallet.base.pipe.SGML2TokenSequence
- All Implemented Interfaces:
- java.io.Serializable
- public class SGML2TokenSequence
- extends Pipe
- implements java.io.Serializable
Converts a string containing simple SGML tags into a dta TokenSequence of words,
paired with a target TokenSequence containing the SGML tags in effect for each word.
It does not handle nested SGML tags, nor gracefully handle malformed SGML.
- See Also:
- Serialized Form
Methods inherited from class edu.umass.cs.mallet.base.pipe.Pipe |
getDataAlphabet, getInstanceId, getParent, getParentRoot, getTargetAlphabet, isDataAlphabetSet, isTargetProcessing, pipe, readResolve, resolveDataAlphabet, resolveTargetAlphabet, setDataAlphabet, setParent, setTargetAlphabet, setTargetProcessing |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SGML2TokenSequence
public SGML2TokenSequence(CharSequenceLexer lexer,
java.lang.String backgroundTag,
boolean saveSource)
SGML2TokenSequence
public SGML2TokenSequence(CharSequenceLexer lexer,
java.lang.String backgroundTag)
SGML2TokenSequence
public SGML2TokenSequence(java.lang.String regex,
java.lang.String backgroundTag)
SGML2TokenSequence
public SGML2TokenSequence()
pipe
public Instance pipe(Instance carrier)
- Description copied from class:
Pipe
- Process an Instance. This method takes an input Instance,
destructively modifies it in some way, and returns it.
This is the method by which all pipes are eventually run.
One can create a new concrete subclass of Pipe simply by
implementing this method.
- Specified by:
pipe
in class Pipe
- Parameters:
carrier
- Instance to be processed.
main
public static void main(java.lang.String[] args)