edu.umass.cs.mallet.base.classify
Class NaiveBayesTrainer

java.lang.Object
  extended byedu.umass.cs.mallet.base.classify.ClassifierTrainer
      extended byedu.umass.cs.mallet.base.classify.IncrementalClassifierTrainer
          extended byedu.umass.cs.mallet.base.classify.NaiveBayesTrainer
All Implemented Interfaces:
Boostable, java.io.Serializable

public class NaiveBayesTrainer
extends IncrementalClassifierTrainer
implements Boostable, java.io.Serializable

Class used to generate a NaiveBayes classifier from a set of training data. In an Bayes classifier, the p(Classification|Data) = p(Data|Classification)p(Classification)/p(Data)

To compute the likelihood:
p(Data|Classification) = p(d1,d2,..dn | Classification)
Naive Bayes makes the assumption that all of the data are conditionally independent given the Classification:
p(d1,d2,...dn | Classification) = p(d1|Classification)p(d2|Classification)..

As with other classifiers in Mallet, NaiveBayes is implemented as two classes: a trainer and a classifier. The NaiveBayesTrainer produces estimates of the various p(dn|Classifier) and contructs this class with those estimates.

A call to train() or incrementalTrain() produces a NaiveBayes classifier that can can be used to classify instances. A call to incrementalTrain() does not throw away the internal state of the trainer; subsequent calls to incrementalTrain() train by extending the previous training set.

A NaiveBayesTrainer can be persisted using serialization.

See Also:
NaiveBayes, Serialized Form

Constructor Summary
NaiveBayesTrainer()
           
 
Method Summary
 Multinomial.Estimator getFeatureMultinomialEstimator()
          Get the MultinomialEstimator instance used to specify the type of estimator for features.
 Multinomial.Estimator getPriorMultinomialEstimator()
          Get the MultinomialEstimator instance used to specify the type of estimator for priors.
 Classifier incrementalTrain(InstanceList trainingList, InstanceList validationList, InstanceList testSet, ClassifierEvaluating evaluator, Classifier initialClassifier)
          Create a NaiveBayes classifier from a set of training data and the previous state of the trainer.
 void reset()
          clears the internal state of the trainer.
 void setFeatureMultinomialEstimator(Multinomial.Estimator me)
          Set the Multinomial Estimator used for features.
 void setPriorMultinomialEstimator(Multinomial.Estimator me)
          Set the Multinomial Estimator used for priors.
 java.lang.String toString()
           
 Classifier train(InstanceList trainingList, InstanceList validationList, InstanceList testSet, ClassifierEvaluating evaluator, Classifier initialClassifier)
          Create a NaiveBayes classifier from a set of training data.
 
Methods inherited from class edu.umass.cs.mallet.base.classify.IncrementalClassifierTrainer
incrementalTrain, incrementalTrain, incrementalTrain, incrementalTrain
 
Methods inherited from class edu.umass.cs.mallet.base.classify.ClassifierTrainer
main, train, train, train, train
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

NaiveBayesTrainer

public NaiveBayesTrainer()
Method Detail

getFeatureMultinomialEstimator

public Multinomial.Estimator getFeatureMultinomialEstimator()
Get the MultinomialEstimator instance used to specify the type of estimator for features.

Returns:
estimator to be cloned on next call to train() or first call to incrementalTrain()

setFeatureMultinomialEstimator

public void setFeatureMultinomialEstimator(Multinomial.Estimator me)
Set the Multinomial Estimator used for features. The MulitnomialEstimator is internally cloned and the clone is used to maintain the counts that will be used to generate probability estimates the next time train() or an initial incrementalTrain() is run. Defaults to a Multinomial.LaplaceEstimator()

Parameters:
me - to be cloned on next call to train() or first call to incrementalTrain()

getPriorMultinomialEstimator

public Multinomial.Estimator getPriorMultinomialEstimator()
Get the MultinomialEstimator instance used to specify the type of estimator for priors.

Returns:
estimator to be cloned on next call to train() or first call to incrementalTrain()

setPriorMultinomialEstimator

public void setPriorMultinomialEstimator(Multinomial.Estimator me)
Set the Multinomial Estimator used for priors. The MulitnomialEstimator is internally cloned and the clone is used to maintain the counts that will be used to generate probability estimates the next time train() or an initial incrementalTrain() is run. Defaults to a Multinomial.LaplaceEstimator()

Parameters:
me - to be cloned on next call to train() or first call to incrementalTrain()

reset

public void reset()
clears the internal state of the trainer. Called automatically at the end of train()

Specified by:
reset in class IncrementalClassifierTrainer

train

public Classifier train(InstanceList trainingList,
                        InstanceList validationList,
                        InstanceList testSet,
                        ClassifierEvaluating evaluator,
                        Classifier initialClassifier)
Create a NaiveBayes classifier from a set of training data. The trainer uses counts of each feature in an instance's feature vector to provide an estimate of p(Labeling| feature). The internal state of the trainer is thrown away ( by a call to reset() ) when train() returns. Each call to train() is completely independent of any other.

Specified by:
train in class ClassifierTrainer
Parameters:
trainingList - The InstanceList to be used to train the classifier. Within each instance the data slot is an instance of FeatureVector and the target slot is an instance of Labeling
validationList - Currently unused
testSet - Currently unused
evaluator - Currently unused
initialClassifier - Currently unused
Returns:
The NaiveBayes classifier as trained on the trainingList

incrementalTrain

public Classifier incrementalTrain(InstanceList trainingList,
                                   InstanceList validationList,
                                   InstanceList testSet,
                                   ClassifierEvaluating evaluator,
                                   Classifier initialClassifier)
Create a NaiveBayes classifier from a set of training data and the previous state of the trainer. Subsequent calls to incrementalTrain() add to the state of the trainer. An incremental training session should consist only of calls to incrementalTrain() and have no calls to train(); *

Specified by:
incrementalTrain in class IncrementalClassifierTrainer
Parameters:
trainingList - The InstanceList to be used to train the classifier. Within each instance the data slot is an instance of FeatureVector and the target slot is an instance of Labeling
validationList - Currently unused
testSet - Currently unused
evaluator - Currently unused
initialClassifier - Currently unused
Returns:
The NaiveBayes classifier as trained on the trainingList and the previous trainingLists passed to incrementalTrain()

toString

public java.lang.String toString()
Overrides:
toString in class ClassifierTrainer