Package edu.umass.cs.mallet.base.pipe

Classes for processing arbitrary data into instances.

See:
          Description

Interface Summary
PipeOutputAccumulator  
 

Class Summary
AddClassifierTokenPredictions This pipe uses a Classifier to label each token (i.e., using 0-th order Markov assumption), then adds the predictions as features to each token.
AddClassifierTokenPredictions.TokenClassifiers This inner class represents the trained token classifiers.
Array2FeatureVector Converts a Java array of numerical types to a FeatureVector, where the Alphabet is the data array index wrapped in an Integer object.
AugmentableFeatureVectorAddConjunctions Add specified conjunctions to each instance.
AugmentableFeatureVectorLogScale Given an AugmentableFeatureVector, set those values greater than or equal to 1 to log(value)+1.
CharSequence2CharNGrams Transform a character sequence into a token sequence of character N grams.
CharSequence2TokenSequence Pipe that tokenizes a character sequence.
CharSequenceArray2TokenSequence Transform an array of character Sequences into a token sequence.
CharSequenceReplace Given a string, repeatedly look for matches of the regex, and replace the entire match with the given replacement string.
CharSubsequence Given a string, return only the portion of the string inside a regex parenthesized group.
Classification2ConfidencePredictingFeatureVector Pipe features from underlying classifier to the confidence prediction instance list
Csv2Array Converts a string of comma separated values to an array.
Csv2FeatureVector Converts a string of the form feature_1:val_1 feature_2:val_2 ...
Directory2FileIterator Convert a File object representing a directory into a FileIterator which iterates over files in the directory matching a pattern and which extracts a label from each file path to become the target field of the instance.
FeatureSequence2AugmentableFeatureVector Convert the data field from a feature sequence to an augmentable feature vector.
FeatureSequence2FeatureVector Convert the data field from a feature sequence to a feature vector.
FeatureValueString2FeatureVector Unimplemented.
FeatureVectorConjunctions Include in the FeatureVector conjunctions of all its features.
Filename2CharSequence Given a filename contained in a string, read in contents of file into a CharSequence.
Input2CharSequence Pipe that can read from various kinds of text sources (either URI, File, or Reader) into a CharSequence
InstanceListTrimFeaturesByCount Unimplemented.
IteratingPipe Converts the iterator in the data field to a PipeOutputAccumulation of the values spanned by the iterator.
IteratorPipe Unimplemented.
LineGroupString2TokenSequence  
MakeAmpersandXMLFriendly convert & to &amp in tokens of a token sequence
Noop A pipe that does nothing to the instance fields but which has side effects on the dictionary.
ParallelPipes Convert an instance to the PipeOutputAccumulator output produced by running the original instance through each of the sub pipes contained in the parallel pipe.
Pipe The abstract superclass of all Pipes, which transform one data type to another.
PipeOutputArrayList A PipeOutputAccumulator implemented as an ArrayList.
PrintInput Print the data field of each instance.
PrintInputAndTarget Print the data and target fields of each instance.
PrintTokenSequenceFeatures Print properties of the token sequence in the data field and the corresponding value of any token in a token sequence or feature in a featur sequence in the target field.
SaveDataInSource Set the source field of each instance to its data field.
SelectiveSGML2TokenSequence Similar to SGML2TokenSequence, except that only the tags listed in allowedTags are converted to Labels.
SerialPipes Convert an instance through a sequence of pipes.
SGML2TokenSequence Converts a string containing simple SGML tags into a dta TokenSequence of words, paired with a target TokenSequence containing the SGML tags in effect for each word.
SimpleTaggerSentence2TokenSequence Converts an external encoding of a sequence of elements with binary features to a TokenSequence.
SourceLocation2TokenSequence Read from File or BufferedRead in the data field and produce a TokenSequence.
StringAddNewLineDelimiter Pipe that can adds special text between lines to explicitly represent line breaks.
Target2FeatureSequence Convert a token sequence in the target field into a feature sequence in the target field.
Target2Label Convert object in the target field into a label in the target field.
Target2LabelSequence convert a token sequence in the target field into a label sequence in the target field.
TargetRememberLastLabel For each position in the target, remember the last non-background label.
Token2FeatureVector convert the property list on a token into a feature vector
TokenSequence2FeatureSequence Convert the token sequence in the data field each instance to a feature sequence.
TokenSequence2FeatureSequenceWithBigrams Convert the token sequence in the data field of each instance to a feature sequence that preserves bigram information.
TokenSequence2FeatureVectorSequence Convert the token sequence in the data field of each instance to a feature vector sequence.
TokenSequence2TokenIterator Convert the token sequence in the data field of each instance to a token iterator
TokenSequenceLowercase Convert the text in each token in the token sequence in the data field to lower case.
TokenSequenceMatchDataAndTarget Run a regular expression over the text of each token; replace the text with the substring matching one regex group; create a target TokenSequence from the text matching another regex group.
TokenSequenceNGrams Convert the token sequence in the data field to a token sequence of ngrams.
TokenSequenceParseFeatureString  
TokenSequenceRemoveNonAlpha Remove tokens that contain non-alphabetic characters.
TokenSequenceRemoveStopwords Remove tokens from the token sequence in the data field whose text is in the stopword list.
 

Exception Summary
PipeException  
 

Package edu.umass.cs.mallet.base.pipe Description

Classes for processing arbitrary data into instances. Every class in this Directory should be a subclass of Pipe. Other classes should go in base.pipe.util.