SimpleTagger Example

SimpleTagger Example

SimpleTagger is a command line interface to the MALLET Conditional Random Field(CRF) class. Here we present an extremely simple example showing the use of SimpleTagger to label a sequence of text.

Given an input file "sample" as follows:

CAPITAL Bill  noun
        slept non-noun
        here non-noun
where all but the last token on each line is a binary feature, and the last token on the line is the label name, a CRF can be created with SimpleTagger as follows (on one line):
hough@gobur:~/tagger-test$ java -cp 
 "/home/hough/mallet/class:/home/hough/mallet/lib/mallet-deps.jar"
 edu.umass.cs.mallet.base.fst.SimpleTagger
  --train true --model-file nouncrf  sample
This assumes that mallet has been installed and built in /home/hough/mallet. Note that we specify the MALLET build directory (/home/hough/mallet/class) and the necessary MALLET jar files (/home/hough/mallet/mallet-deps.jar) in the classpath. The --train true option specifies that we are training, and --model-file nouncrf specifies where we would like the CRF written to.

This produces a trained CRF in the file "nouncrf".

If we have a file "stest" we would like labelled:

CAPITAL Al
        slept
        here
we can do this with the CRF in file nouncrf by typing:
hough@gobur:~/tagger-test$ java -cp
"/home/hough/mallet/class:/home/hough/mallet/lib/mallet-deps.jar"
 edu.umass.cs.mallet.base.fst.SimpleTagger
--model-file nouncrf  stest
which produces the following output:
Number of predicates: 5
noun CAPITAL Al
non-noun  slept
non-noun  here

A list of all the options available with SimpleTagger can be obtained by specifying the --help option:

hough@gobur:~/tagger-test$ java -cp
"/home/hough/mallet/class:/home/hough/mallet/lib/mallet-deps.jar"
 edu.umass.cs.mallet.base.fst.SimpleTagger
--help