|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.umass.cs.mallet.base.types.InstanceList
A list of machine learning instances, typically used for training or testing of a machine learning algorithm.
All of the instances in the list will have been passed through the
same Pipe
, and thus must also share the same data and target Alphabets.
InstanceList keeps a reference to the pipe and the two alphabets.
The most common way of adding instances to an InstanceList is through
the add(PipeInputIterator)
method. PipeInputIterators are a way of mapping general
data sources into instances suitable for processing through a pipe.
As each Instance
is pulled from the PipeInputIterator, the InstanceList
copies the instance and runs the copy through its pipe (with resultant
destructive modifications) before saving the modified instance on its list.
This is the usual way in which instances are transformed by pipes.
InstanceList also contains methods for randomly generating lists of feature vectors; splitting lists into non-overlapping subsets (useful for test/train splits), and iterators for cross validation.
Instance
,
Pipe
,
PipeInputIterator
,
Serialized FormNested Class Summary | |
class |
InstanceList.CrossValidationIterator
CrossValidationIterator allows iterating over pairs of
InstanceList , where each pair is split into training/testing
based on nfolds. |
class |
InstanceList.Iterator
|
protected static interface |
InstanceList.Stream
|
Constructor Summary | |
InstanceList()
Creates a list which must have its pipe set later. |
|
InstanceList(Alphabet dataVocab,
Alphabet targetVocab)
Creates a list which will not pass added instances through a pipe. |
|
InstanceList(Pipe pipe)
Creates a list with the given pipe. |
|
InstanceList(Pipe pipe,
int capacity)
Creates a list with the given pipe and initial capacity where all added instances are passed through the specified pipe. |
|
InstanceList(Random r,
Alphabet vocab,
java.lang.String[] classNames,
int meanInstancesPerLabel)
|
|
InstanceList(Random r,
Dirichlet classCentroidDistribution,
double classCentroidAverageAlphaMean,
double classCentroidAverageAlphaVariance,
double featureVectorSizePoissonLambda,
double classInstanceCountPoissonLambda,
java.lang.String[] classNames)
Creates a list consisting of randomly-generated FeatureVector s. |
|
InstanceList(Random r,
int vocabSize,
int numClasses)
|
Method Summary | |
boolean |
add(Instance instance)
Appends the instance to this list. |
boolean |
add(Instance instance,
double instanceWeight)
Appends the instance to this list, assigning it the specified weight. |
void |
add(InstanceList ilist)
Adds to this list each instance in the input list. |
boolean |
add(java.lang.Object data,
java.lang.Object target,
java.lang.Object name,
java.lang.Object source)
Constructs and appends an instance to this list, passing it through this list's pipe. |
boolean |
add(java.lang.Object data,
java.lang.Object target,
java.lang.Object name,
java.lang.Object source,
double instanceWeight)
Constructs and appends an instance to this list, passing it through this list's pipe and assigning it the specified weight. |
void |
add(PipeInputIterator pi)
Adds to this list every instance generated by the iterator, passing each one through this list's pipe. |
InstanceList |
cloneEmpty()
|
PipeOutputAccumulator |
clonePipeOutputAccumulator()
|
InstanceList.CrossValidationIterator |
crossValidationIterator(int nfolds)
|
InstanceList.CrossValidationIterator |
crossValidationIterator(int nfolds,
int seed)
|
java.lang.Object |
get(int index)
Returns the Instance at the specified index. |
Alphabet |
getDataAlphabet()
Returns the Alphabet mapping features of the data to
integers. |
java.lang.Class |
getDataClass()
Returns the class of the object contained in the data field of the first Instance in this list. |
FeatureSelection |
getFeatureSelection()
|
Instance |
getInstance(int index)
Returns the Instance at the specified index. |
double |
getInstanceWeight(int index)
|
FeatureSelection[] |
getPerLabelFeatureSelection()
|
Pipe |
getPipe()
Returns the pipe through which each added Instance is passed,
which may be null . |
Alphabet |
getTargetAlphabet()
Returns the Alphabet mapping target output labels to
integers. |
InstanceList.Iterator |
iterator()
|
static InstanceList |
load(java.io.File file)
Constructs a new InstanceList , deserialized from file . |
double |
noisify(double ratio)
|
void |
pipeOutputAccumulate(Instance carrier,
Pipe iteratedPipe)
|
void |
removeSources()
Sets the "source" field to null in all instances. |
void |
removeTargets()
Sets the "target" field to null in all instances. |
InstanceList |
sampleWithInstanceWeights(java.util.Random r)
Returns an InstanceList of the same size, where the instances come from the
random sampling (with replacement) of this list using the instance weights. |
InstanceList |
sampleWithReplacement(java.util.Random r,
int numSamples)
|
InstanceList |
sampleWithWeights(java.util.Random r,
double[] weights)
Returns an InstanceList of the same size, where the instances come from the
random sampling (with replacement) of this list using the given weights. |
void |
save(java.io.File file)
Saves this InstanceList to file . |
void |
setFeatureSelection(FeatureSelection selectedFeatures)
|
void |
setInstance(int index,
Instance instance)
Replaces the Instance at position index
with a new one. |
void |
setInstanceWeight(int index,
double weight)
|
void |
setPerLabelFeatureSelection(FeatureSelection[] selectedFeatures)
|
InstanceList |
shallowClone()
|
int |
size()
|
InstanceList[] |
split(double[] proportions)
|
InstanceList[] |
split(java.util.Random r,
double[] proportions)
Shuffles the elements of this list among several smaller lists. |
InstanceList[] |
splitByModulo(int m)
Returns a pair of new lists such that the first list in the pair contains every m th element of this list, starting with the first. |
InstanceList[] |
splitInOrder(double[] proportions)
Chops this list into several sequential sublists. |
InstanceList |
subList(int start,
int end)
|
LabelVector |
targetLabelDistribution()
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public InstanceList(Pipe pipe, int capacity)
pipe
- The pipe through which all added instances will be passed.public InstanceList(Pipe pipe)
pipe
- The pipe through which all added instances will be passed.public InstanceList(Alphabet dataVocab, Alphabet targetVocab)
Creates a list which will not pass added instances through a pipe.
Used in those infrequent circumstances when the InstanceList
has no pipe, and objects containing vocabularies are entered
directly into the InstanceList
; for example, the creation of a
random InstanceList using
Dirichlet
s and
Multinomial
s.
dataVocab
- The vocabulary for added instances' data fieldstargetVocab
- The vocabulary for added instances' targetspublic InstanceList()
public InstanceList(Random r, Dirichlet classCentroidDistribution, double classCentroidAverageAlphaMean, double classCentroidAverageAlphaVariance, double featureVectorSizePoissonLambda, double classInstanceCountPoissonLambda, java.lang.String[] classNames)
FeatureVector
s.
public InstanceList(Random r, Alphabet vocab, java.lang.String[] classNames, int meanInstancesPerLabel)
public InstanceList(Random r, int vocabSize, int numClasses)
Method Detail |
public InstanceList subList(int start, int end)
public InstanceList shallowClone()
public double noisify(double ratio)
public InstanceList cloneEmpty()
public InstanceList[] split(java.util.Random r, double[] proportions)
proportions
- A list of numbers (not necessarily summing to 1) which,
when normalized, correspond to the proportion of elements in each returned
sublist.r
- The source of randomness to use in shuffling.
InstanceList
for each element of proportions
public InstanceList[] split(double[] proportions)
public InstanceList[] splitInOrder(double[] proportions)
proportions
- A list of numbers corresponding to the proportion of
elements in each returned sublist.
InstanceList
for each element of proportions
public InstanceList[] splitByModulo(int m)
m
th element of this list, starting with the first.
The second list contains all remaining elements.
public InstanceList sampleWithReplacement(java.util.Random r, int numSamples)
public Instance getInstance(int index)
Instance
at the specified index.
public InstanceList sampleWithInstanceWeights(java.util.Random r)
InstanceList
of the same size, where the instances come from the
random sampling (with replacement) of this list using the instance weights.
The new instances all have their weights set to one.
public InstanceList sampleWithWeights(java.util.Random r, double[] weights)
InstanceList
of the same size, where the instances come from the
random sampling (with replacement) of this list using the given weights.
The length of the weight array must be the same as the length of this list
The new instances all have their weights set to one.
public void setInstance(int index, Instance instance)
Instance
at position index
with a new one.
public double getInstanceWeight(int index)
public void setInstanceWeight(int index, double weight)
public void setFeatureSelection(FeatureSelection selectedFeatures)
public FeatureSelection getFeatureSelection()
public void setPerLabelFeatureSelection(FeatureSelection[] selectedFeatures)
public FeatureSelection[] getPerLabelFeatureSelection()
public void removeTargets()
null
in all instances. This makes unlabeled data.
public void removeSources()
null
in all instances. This will often save memory when
the raw data had been placed in that field.
public java.lang.Object get(int index)
Instance
at the specified index.
public static InstanceList load(java.io.File file)
InstanceList
, deserialized from file
. If the
string value of file
is "-", then deserialize from System.in
.
public void save(java.io.File file)
InstanceList
to file
.
If the string value of file
is "-", then
serialize to System.out
.
public int size()
public java.lang.Class getDataClass()
Instance
in this list.
public Pipe getPipe()
Instance
is passed,
which may be null
.
public Alphabet getDataAlphabet()
Alphabet
mapping features of the data to
integers.
public Alphabet getTargetAlphabet()
Alphabet
mapping target output labels to
integers.
public LabelVector targetLabelDistribution()
public void pipeOutputAccumulate(Instance carrier, Pipe iteratedPipe)
pipeOutputAccumulate
in interface PipeOutputAccumulator
public PipeOutputAccumulator clonePipeOutputAccumulator()
clonePipeOutputAccumulator
in interface PipeOutputAccumulator
public InstanceList.Iterator iterator()
public InstanceList.CrossValidationIterator crossValidationIterator(int nfolds, int seed)
public InstanceList.CrossValidationIterator crossValidationIterator(int nfolds)
public void add(PipeInputIterator pi)
public void add(InstanceList ilist)
Adds to this list each instance in the input list.
The lists' pipes must match, except that this list's pipe is allowed to be "not yet set", and the input list's pipe is allowed to be null.
public boolean add(java.lang.Object data, java.lang.Object target, java.lang.Object name, java.lang.Object source, double instanceWeight)
true
public boolean add(java.lang.Object data, java.lang.Object target, java.lang.Object name, java.lang.Object source)
true
public boolean add(Instance instance)
true
public boolean add(Instance instance, double instanceWeight)
true
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |