Given an input file "sample" as follows:
CAPITAL Bill noun slept non-noun here non-nounwhere all but the last token on each line is a binary feature, and the last token on the line is the label name, a CRF can be created with SimpleTagger as follows (on one line):
hough@gobur:~/tagger-test$ java -cp "/home/hough/mallet/class:/home/hough/mallet/lib/mallet-deps.jar" edu.umass.cs.mallet.base.fst.SimpleTagger --train true --model-file nouncrf sampleThis assumes that mallet has been installed and built in /home/hough/mallet. Note that we specify the MALLET build directory (/home/hough/mallet/class) and the necessary MALLET jar files (/home/hough/mallet/mallet-deps.jar) in the classpath. The --train true option specifies that we are training, and --model-file nouncrf specifies where we would like the CRF written to.
This produces a trained CRF in the file "nouncrf".
If we have a file "stest" we would like labelled:
CAPITAL Al slept herewe can do this with the CRF in file nouncrf by typing:
hough@gobur:~/tagger-test$ java -cp "/home/hough/mallet/class:/home/hough/mallet/lib/mallet-deps.jar" edu.umass.cs.mallet.base.fst.SimpleTagger --model-file nouncrf stestwhich produces the following output:
Number of predicates: 5 noun CAPITAL Al non-noun slept non-noun here
A list of all the options available with SimpleTagger can be obtained by specifying the --help option:
hough@gobur:~/tagger-test$ java -cp "/home/hough/mallet/class:/home/hough/mallet/lib/mallet-deps.jar" edu.umass.cs.mallet.base.fst.SimpleTagger --help