|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.umass.cs.mallet.base.types.InstanceList
edu.umass.cs.mallet.base.types.PagedInstanceList
xxx .split() methods still unreliable
An InstanceList which avoids OutOfMemoryErrors by saving Instances
to disk when there is not enough memory to create a new
Instance. It implements a fixed-size paging scheme, where each page
on disk stores instancesPerPage
Instances. So, while
the number of Instances per pages is constant, the size in bytes of
each page may vary. Using this class instead of InstanceList means
the number of Instances you can store is essentially limited only
by disk size (and patience).
The paging scheme is optimized for the most frequent case of
looping through the InstanceList from index 0 to n. If there are n
instances, then instances 0->(n/size()) are stored together on page
1, instances (n/size)+1 -> 2*(n/size) are on page 2, ... etc. This
way, pages adjacent in the instances
list will usually
be in the same page.
The paging scheme also tries to only keep one page in memory at a
time. The justification for this is that the page size is near the
limit of the maximum number of instances that can be kept in
memory. Since we assume the frequent case is looping from instance
0 to n, keeping other Instances in memory will be a waste of
resources.
About instancesPerPage
-- If
instancesPerPage
= -1, then its value will be set
automatically by the following: When the first OutOfMemoryError is
thrown, count how many instances are currently in memory, then
divide by two. This is a conservative estimate of how many Instance
objects can fit in memory simultaneously. If you know this value
beforehand, simply pass it to the constructor.
NOTE: The event which causes an OutOfMemoryError is the
instantiation of a new Instance, _not_ the addition of this
Instance to an InstanceList. Therefore, if you want to avoid
OutOfMemoryErrors, let PagedInstanceList instantiate the new
Instance for you. IOW, do this:
Pipe p = ...;
PagedInstanceList ilist = new PagedInstanceList (p);
ilist.add (data, target, name, source);
Or This
PipeInputIterator iter = ...;
Pipe p = ...;
PagedInstanceList ilist = new PagedInstanceList (p);
ilist.add (iter);
But Not This:
Pipe p = ...;
PagedInstanceList ilist = new PagedInstanceList (p);
ilist.add (new Instance (data, target, name, source));
If memory is low, the last example will throw an OutOfMemoryError
before control has been passed to PagedInstanceList to catch the
error.
NOTE ALSO: To save write time, we do not write the same Instance to
disk more than once, i.e., there are no dirty bits or
write-throughs. Thus, this assumes that after an Instance has been
passed through its Pipe, it is no longer modified. One way around
this is to call PagedInstanceList.setInstance (Instance inst),
which _will_ overwrite an Instance that has been paged to disk.
InstanceList
,
Serialized FormNested Class Summary |
Nested classes inherited from class edu.umass.cs.mallet.base.types.InstanceList |
InstanceList.CrossValidationIterator, InstanceList.Iterator, InstanceList.Stream |
Constructor Summary | |
PagedInstanceList()
|
|
PagedInstanceList(Pipe pipe)
|
|
PagedInstanceList(Pipe pipe,
int size)
|
|
PagedInstanceList(Pipe pipe,
int size,
int instancesPerPage,
java.io.File swapDir)
Creates a PagedInstanceList where "instancesPerPage" instances are swapped to disk in directory "swapDir" if the amount of free system memory drops below "minFreeMemory" bytes |
Method Summary | |
boolean |
add(Instance instance)
Appends the instance to this list. |
boolean |
add(java.lang.Object data,
java.lang.Object target,
java.lang.Object name,
java.lang.Object source,
double instanceWeight)
Constructs and appends an instance to this list, passing it through this list's pipe and assigning it the specified weight. |
void |
add(PipeInputIterator pi)
Adds to this list every instance generated by the iterator, passing each one through this list's pipe. |
InstanceList |
cloneEmpty()
|
boolean |
collectGarbage()
|
Instance |
getInstance(int index)
Returns the Instance at the specified index. |
static InstanceList |
load(java.io.File file)
Constructs a new InstanceList , deserialized from
file . |
InstanceList |
sampleWithReplacement(java.util.Random r,
int numSamples)
Overridden to add samples in original order to reduce thrashing. |
InstanceList |
sampleWithWeights(java.util.Random r,
double[] weights)
Returns an InstanceList of the same size, where the instances come from the
random sampling (with replacement) of this list using the given weights. |
void |
setCollectGarbage(boolean b)
|
void |
setInstance(int index,
Instance instance)
Replaces the Instance at position
index with a new one. |
InstanceList |
shallowClone()
|
InstanceList[] |
split(double[] proportions)
|
InstanceList[] |
split(java.util.Random r,
double[] proportions)
Shuffles the elements of this list among several smaller lists. |
InstanceList[] |
splitByModulo(int m)
Returns a pair of new lists such that the first list in the pair contains every m th element of this list,
starting with the first. |
void |
swapOutAll()
Save all instances to disk and set to null to free memory. |
Methods inherited from class edu.umass.cs.mallet.base.types.InstanceList |
add, add, add, clonePipeOutputAccumulator, crossValidationIterator, crossValidationIterator, get, getDataAlphabet, getDataClass, getFeatureSelection, getInstanceWeight, getPerLabelFeatureSelection, getPipe, getTargetAlphabet, iterator, noisify, pipeOutputAccumulate, removeSources, removeTargets, sampleWithInstanceWeights, save, setFeatureSelection, setInstanceWeight, setPerLabelFeatureSelection, size, splitInOrder, subList, targetLabelDistribution |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public PagedInstanceList(Pipe pipe, int size, int instancesPerPage, java.io.File swapDir)
pipe
- instance pipeinstancesPerPage
- number of Instances to store in each
page. If -1, determine at first call to
swapOutExcept
swapDir
- where the pages on disk live.public PagedInstanceList(Pipe pipe, int size)
public PagedInstanceList(Pipe pipe)
public PagedInstanceList()
Method Detail |
public InstanceList[] split(java.util.Random r, double[] proportions)
split
in class InstanceList
proportions
- A list of numbers (not necessarily summing to 1) which,
when normalized, correspond to the proportion of elements in each returned
sublist.r
- The source of randomness to use in shuffling.
InstanceList
for each element of proportions
public InstanceList[] split(double[] proportions)
split
in class InstanceList
public InstanceList[] splitByModulo(int m)
m
th element of this list,
starting with the first. The second list contains all remaining
elements. Overrides InstanceList.splitByModulo to use
PagedInstanceLists.
splitByModulo
in class InstanceList
public InstanceList sampleWithReplacement(java.util.Random r, int numSamples)
sampleWithReplacement
in class InstanceList
public InstanceList sampleWithWeights(java.util.Random r, double[] weights)
InstanceList
of the same size, where the instances come from the
random sampling (with replacement) of this list using the given weights.
The length of the weight array must be the same as the length of this list
The new instances all have their weights set to one.
sampleWithWeights
in class InstanceList
public void swapOutAll()
public Instance getInstance(int index)
Instance
at the specified index. If
this Instance is not in memory, swap a block of instances back
into memory.
getInstance
in class InstanceList
public void setInstance(int index, Instance instance)
Instance
at position
index
with a new one. Note that this is the only
sanctioned way of changing an Instance.
setInstance
in class InstanceList
public boolean add(Instance instance)
add
in class InstanceList
true
if successfulpublic void add(PipeInputIterator pi)
add
in class InstanceList
public boolean add(java.lang.Object data, java.lang.Object target, java.lang.Object name, java.lang.Object source, double instanceWeight)
add
in class InstanceList
true
public void setCollectGarbage(boolean b)
public boolean collectGarbage()
public InstanceList shallowClone()
shallowClone
in class InstanceList
public InstanceList cloneEmpty()
cloneEmpty
in class InstanceList
public static InstanceList load(java.io.File file)
InstanceList
, deserialized from
file
. If the string value of file
is
"-", then deserialize from System.in
.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |