edu.umass.cs.mallet.base.classify
Class MCMaxEntTrainer

java.lang.Object
  extended byedu.umass.cs.mallet.base.classify.ClassifierTrainer
      extended byedu.umass.cs.mallet.base.classify.MCMaxEntTrainer
All Implemented Interfaces:
Boostable, java.io.Serializable

public class MCMaxEntTrainer
extends ClassifierTrainer
implements Boostable, java.io.Serializable

The trainer for a Maximum Entropy classifier.

See Also:
Serialized Form

Field Summary
static java.lang.String EXP_GAIN
           
static java.lang.String GRADIENT_GAIN
           
static java.lang.String INFORMATION_GAIN
           
 
Constructor Summary
MCMaxEntTrainer()
           
MCMaxEntTrainer(boolean useHyperbolicPrior)
           
MCMaxEntTrainer(CommandOption.List col)
           
MCMaxEntTrainer(double gaussianPriorVariance)
          Constructs a trainer with a parameter to avoid overtraining.
MCMaxEntTrainer(double gaussianPriorVariance, boolean useMultiConditionalTraining)
           
MCMaxEntTrainer(double hyperbolicPriorSlope, double hyperbolicPriorSharpness)
           
 
Method Summary
static CommandOption.List getCommandOptionList()
           
 Maximizable.ByGradient getMaximizableTrainer(InstanceList ilist)
           
 int getValueCalls()
          Counts how many times this trainer has computed the log probability of training labels.
 int getValueGradientCalls()
          Counts how many times this trainer has computed the gradient of the log probability of training labels.
 MCMaxEntTrainer setGaussianPriorVariance(double gaussianPriorVariance)
          Sets a parameter to prevent overtraining.
 MCMaxEntTrainer setHyperbolicPriorSharpness(double hyperbolicPriorSharpness)
           
 MCMaxEntTrainer setHyperbolicPriorSlope(double hyperbolicPriorSlope)
           
 MCMaxEntTrainer setNumIterations(int i)
          Specifies the maximum number of iterations to run during a single call to train or trainWithFeatureInduction.
 MCMaxEntTrainer setUseHyperbolicPrior(boolean useHyperbolicPrior)
           
 java.lang.String toString()
           
 Classifier train(InstanceList trainingSet, InstanceList validationSet, InstanceList testSet, ClassifierEvaluating evaluator, Classifier initialClassifier)
          Return a new classifier tuned using the three arguments.
 Classifier trainWithFeatureInduction(InstanceList trainingData, InstanceList validationData, InstanceList testingData, ClassifierEvaluating evaluator, int totalIterations, int numIterationsBetweenFeatureInductions, int numFeatureInductions, int numFeaturesPerFeatureInduction)
          Trains a maximum entropy model using feature selection and feature induction (adding conjunctions of features as new features).
 Classifier trainWithFeatureInduction(InstanceList trainingData, InstanceList validationData, InstanceList testingData, ClassifierEvaluating evaluator, MCMaxEnt maxent, int totalIterations, int numIterationsBetweenFeatureInductions, int numFeatureInductions, int numFeaturesPerFeatureInduction, java.lang.String gainName)
          Like the other version of trainWithFeatureInduction, but allows some default options to be changed.
 
Methods inherited from class edu.umass.cs.mallet.base.classify.ClassifierTrainer
main, train, train, train, train
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

EXP_GAIN

public static final java.lang.String EXP_GAIN
See Also:
Constant Field Values

GRADIENT_GAIN

public static final java.lang.String GRADIENT_GAIN
See Also:
Constant Field Values

INFORMATION_GAIN

public static final java.lang.String INFORMATION_GAIN
See Also:
Constant Field Values
Constructor Detail

MCMaxEntTrainer

public MCMaxEntTrainer(CommandOption.List col)

MCMaxEntTrainer

public MCMaxEntTrainer()

MCMaxEntTrainer

public MCMaxEntTrainer(boolean useHyperbolicPrior)

MCMaxEntTrainer

public MCMaxEntTrainer(double gaussianPriorVariance)
Constructs a trainer with a parameter to avoid overtraining. 1.0 is usually a reasonable default value.


MCMaxEntTrainer

public MCMaxEntTrainer(double gaussianPriorVariance,
                       boolean useMultiConditionalTraining)

MCMaxEntTrainer

public MCMaxEntTrainer(double hyperbolicPriorSlope,
                       double hyperbolicPriorSharpness)
Method Detail

getCommandOptionList

public static CommandOption.List getCommandOptionList()

getMaximizableTrainer

public Maximizable.ByGradient getMaximizableTrainer(InstanceList ilist)

setNumIterations

public MCMaxEntTrainer setNumIterations(int i)
Specifies the maximum number of iterations to run during a single call to train or trainWithFeatureInduction. Not currently functional.

Returns:
This trainer

setUseHyperbolicPrior

public MCMaxEntTrainer setUseHyperbolicPrior(boolean useHyperbolicPrior)

setGaussianPriorVariance

public MCMaxEntTrainer setGaussianPriorVariance(double gaussianPriorVariance)
Sets a parameter to prevent overtraining. A smaller variance for the prior means that feature weights are expected to hover closer to 0, so extra evidence is required to set a higher weight.

Returns:
This trainer

setHyperbolicPriorSlope

public MCMaxEntTrainer setHyperbolicPriorSlope(double hyperbolicPriorSlope)

setHyperbolicPriorSharpness

public MCMaxEntTrainer setHyperbolicPriorSharpness(double hyperbolicPriorSharpness)

train

public Classifier train(InstanceList trainingSet,
                        InstanceList validationSet,
                        InstanceList testSet,
                        ClassifierEvaluating evaluator,
                        Classifier initialClassifier)
Description copied from class: ClassifierTrainer
Return a new classifier tuned using the three arguments.

Specified by:
train in class ClassifierTrainer
Parameters:
trainingSet - examples used to set parameters.
validationSet - examples used to tune meta-parameters. May be null.
testSet - examples not examined at all for training, but passed on to diagnostic routines. May be null.
initialClassifier - training process may start from here. The parameters of the initialClassifier are not modified. May be null.

trainWithFeatureInduction

public Classifier trainWithFeatureInduction(InstanceList trainingData,
                                            InstanceList validationData,
                                            InstanceList testingData,
                                            ClassifierEvaluating evaluator,
                                            int totalIterations,
                                            int numIterationsBetweenFeatureInductions,
                                            int numFeatureInductions,
                                            int numFeaturesPerFeatureInduction)

Trains a maximum entropy model using feature selection and feature induction (adding conjunctions of features as new features).

Parameters:
trainingData - A list of Instances whose data fields are binary, augmentable FeatureVectors. and whose target fields are Labels.
validationData - [not currently used] As trainingData, or null.
testingData - As trainingData, or null.
evaluator - The evaluator to track training progress and decide whether to continue, or null.
totalIterations - The maximum total number of training iterations, including those taken during feature induction.
numIterationsBetweenFeatureInductions - How many iterations to train between one round of feature induction and the next; this should usually be fairly small, like 5 or 10, to avoid overfitting with current features.
numFeatureInductions - How many rounds of feature induction to run before beginning normal training.
numFeaturesPerFeatureInduction - The maximum number of features to choose during each round of featureInduction.
Returns:
The trained MaxEnt classifier

trainWithFeatureInduction

public Classifier trainWithFeatureInduction(InstanceList trainingData,
                                            InstanceList validationData,
                                            InstanceList testingData,
                                            ClassifierEvaluating evaluator,
                                            MCMaxEnt maxent,
                                            int totalIterations,
                                            int numIterationsBetweenFeatureInductions,
                                            int numFeatureInductions,
                                            int numFeaturesPerFeatureInduction,
                                            java.lang.String gainName)

Like the other version of trainWithFeatureInduction, but allows some default options to be changed.

Parameters:
maxent - An initial partially-trained classifier (default null). This classifier may be modified during training.
gainName - The estimate of gain (log-likelihood increase) we want our chosen features to maximize. Should be one of MaxEntTrainer.EXP_GAIN, MaxEntTrainer.GRADIENT_GAIN, or MaxEntTrainer.INFORMATION_GAIN (default EXP_GAIN).
Returns:
The trained MaxEnt classifier

getValueGradientCalls

public int getValueGradientCalls()
Counts how many times this trainer has computed the gradient of the log probability of training labels.


getValueCalls

public int getValueCalls()
Counts how many times this trainer has computed the log probability of training labels.


toString

public java.lang.String toString()
Overrides:
toString in class ClassifierTrainer