http://www.chokkan.org/software/crfsuite/manual.html
ç®æ¬¡
ãã®ã»ã¯ã·ã§ã³ã§ã¯ãCRFsuiteããã¬ãŒãã³ã°ããã³ã¿ã°ä»ãã«äœ¿çšããããŒã¿åœ¢åŒã«ã€ããŠèª¬æããŸãã ããŒã¿ã¯äžé£ã®é
ç®ã·ãŒã±ã³ã¹ã§æ§æãããåé
ç®ã·ãŒã±ã³ã¹ã¯é£ç¶ããè¡ã§è¡šããã空ã®è¡ã§çµãããŸãã ã¢ã€ãã ã·ãŒã±ã³ã¹ã¯ããã®ç¹æ§ïŒã©ãã«ããã³å±æ§ïŒãã©ã€ã³ã§èšè¿°ãããäžé£ã®ã¢ã€ãã ã§æ§æãããŸãã é
ç®è¡ã¯ã©ãã«ã§å§ãŸãã屿§ã¯TABïŒ\tïŒæåã§åºåãããŸãã
ããã¯ãã¬ãŒãã³ã°ããŒã¿ã®äŸã§ãïŒCoNLL 2000ãã£ã³ã¯å ±æã¿ã¹ã¯ããååŸïŒã
http://www.chokkan.org/software/crfsuite/data_sample.png å³1. CRFsuiteã®ãµã³ãã«ããŒã¿
ãã®äŸã«ã¯ã4ã€ã®ã¢ã€ãã ã·ãŒã±ã³ã¹ãå«ãŸããŠããŸãïŒæåŸã®ãã®ã¯éšåçã«ç€ºãããŠããŸãïŒã第1ã®ã·ãŒã±ã³ã¹ã®ç¬¬1ã®é
ç®ã¯ãã©ãã« B-NP ã§æ³šéãããw[0]=Anãw[1]=APãpos[0]=DTã pos[1]= NNPã __ BOS__ ã§ãããã®äŸã®ã©ãã«ãšå±æ§ã¯ãç¹å®ã®åœåèŠåïŒãã£ãŒãã£ãã¶ã€ã³ïŒã«åŸããŸããB-NP ã¯çŸåšã®ããŒã¯ã³ãåè©å¥ã®å§ãŸãã§ããããšã瀺ããw[0]=Anã¯ãçŸåšã®ã¢ã€ãã ã®è¡šé¢åœ¢æ
ããAnãã§ããããšã瀺ããpos[1]=NNP ã¯æ¬¡ã®ããŒã¯ã³ãåºæåè©ã§ããããšã瀺ãã __ BOS__ ã¯çŸåšã®ã¢ã€ãã ãã·ãŒã±ã³ã¹ã®æåã®ã¢ã€ãã ã§ããããšã瀺ããŸããããããCRFsuiteã¯ã©ãã«ã屿§ã®åœåèŠåãæ©èœèšèšã«ã¯é¢å¿ããããŸããããåãªãæååãšããŠæ±ããŸãã CRFsuiteã¯ãã©ãã«ãšå±æ§ã®æå³ãç¥ããªããŠãã屿§ãšã©ãã«ã®é¢é£æ§ïŒç¹åŸŽã®éã¿ïŒãåŠç¿ããŸãïŒäŸãã°ãçŸåšã®ã¢ã€ãã ã屿§ pos[0]=DT ãæããå Žåãæå³ã®ããããªãã©ãã« B-NP ãæããå¯èœæ§ãé«ãïŒãã€ãŸããã©ãã«ã屿§åãããŒã¿ã»ããã«æžã蟌ãã ãã§ä»»æã®æ©èœãèšèšããŠäœ¿çšããããšãã§ããŸãã
屿§ã¯ãã³ãã³æåïŒïŒïŒã§åºåãããã¹ã±ãŒãªã³ã°å€ãæã€ããšãã§ããŸããæ£åŒã«ã¯ãç¹åŸŽã®åœ±é¿éã¯ã察å¿ãã屿§ã®ã¹ã±ãŒãªã³ã°å€ã«ãã£ãŒãã£ãŠã§ã€ããä¹ããŠæ±ºå®ãããŸãã倧ãŸãã«èšãã°ã屿§ã®ã¹ã±ãŒãªã³ã°å€ã¯ã屿§ã®åºçŸé »åºŠãšåæ§ã®å¹æãæã¡ãŸãããå°æ°ç¹ãŸãã¯æ¡éãã«ããããšãã§ããŸããã¹ã±ãŒãªã³ã°å€ã倧ãããšããã¬ãŒãã³ã°ã§ãªãŒããŒãããŒïŒã¬ã³ãžãšã©ãŒïŒãçºçããå¯èœæ§ãããããšã«æ³šæããŠãã ãããã³ãã³æåã¯ããŒã¿ã»ããã§ç¹å¥ãªåœ¹å²ãæã€ãããCRFsuiteã¯ãšã¹ã±ãŒãã·ãŒã±ã³ã¹ã䜿çšããŸãã ã\ïŒãããã³ã\\ãã¯ããããã屿§åã®ãïŒãããã³ã\ãã衚ããŸãã屿§å€ãçç¥ãããå ŽåïŒã³ãã³æåãªãïŒãCRFsuiteã¯ã¹ã±ãŒãªã³ã°å€ã 1 ãšã¿ãªããŸããããšãã°ããããã®3ã€ã®é
ç®ã¯ã屿§ãšã¹ã±ãŒãªã³ã°å€ã®ç¹ã§åãã§ãã
B-NP w[1..4]=a:2 w[1..4]=man w[1..4]=eats
B-NP w[1..4]=a w[1..4]=a w[1..4]=man w[1..4]=eats
B-NP w[1..4]=a:2.0 w[1..4]=man:1.0 w[1..4]=eats:1.0
ã¿ã°ä»ãã®ããŒã¿åœ¢åŒã¯ãåŠç¿çšã®ããŒã¿åœ¢åŒãšãŸã£ããåãã§ãããã¿ã°ä»ãããŒã¿å ã®ã©ãã«ã¯ç©ºã«ããããšãã§ããŸãïŒãã ããçç¥ããããšã¯ã§ããŸããïŒã ã¿ã°ä»ãã®å ŽåãCRFsuiteã¯å ¥åããŒã¿å ã®ã©ãã«ãç¡èŠãããããŸãã¯äºæž¬ã®ããã©ãŒãã³ã¹ã枬å®ããããã«ãããã䜿çšããŸãã
ããã¯ãããŒã¿åœ¢åŒã衚ãBNFèšæ³ã§ãã
<line> ::= <item> | <eos>
<item> ::= <label> ('\t' <attribute>)+ <br>
<eos> ::= <br>
<label> ::= <string>
<attribute> ::= <name> | <name> ':' <scaling>
<name> ::= (<letter> | "\:" | "\\")+
<scaling> ::= <numeric>
<br> ::= '\n'
CRFsuiteãã€ã³ã¹ããŒã«ããæãç°¡åãªæ¹æ³ã¯ããã€ããªé åžã䜿çšããããšã§ããçŸåšãWin32ããã³LinuxïŒIntel 32ãããããã³64ãããã¢ãŒããã¯ãã£ïŒã®ãã€ããªãé åžãããŠããŸãã
CRFsuite 0.5以éããœãŒã¹ããã±ãŒãžã«ã¯libLBFGSã®éšåãå«ãŸããªããªããŸããã CRFsuiteããã«ãããã«ã¯ããŸãlibLBFGSãããŠã³ããŒãããŠãã«ãããå¿ èŠããããŸãã
Windowsç°å¢ã§ã¯ãlibLBFGSã®Visual Studioãœãªã¥ãŒã·ã§ã³ãã¡ã€ã«ïŒlbfgs.slnïŒãéããŠãã«ãããŸãããœãªã¥ãŒã·ã§ã³ãã¡ã€ã«ã¯ãReleaseãŸãã¯Debugãã£ã¬ã¯ããªã«ã¹ã¿ãã£ãã¯ãªã³ã¯ã©ã€ãã©ãªlbfgs.libïŒãªãªãŒã¹ãã«ãïŒãŸãã¯lbfgs_debug.libïŒãããã°ãã«ãïŒããã«ãããŸãã CRFsuiteã®ãœãªã¥ãŒã·ã§ã³ãã¡ã€ã«ïŒcrfsuite.slnïŒã¯ãlibLBFGSã®ããããã¡ã€ã«ãšã©ã€ãã©ãªãã¡ã€ã«ãwin32 / lbfgsãã£ã¬ã¯ããªã«ååšããããšãåæãšããŠããããããã®ãã£ã¬ã¯ããªãäœæããlbfgs.hãlbfgs.libãããã³/ãŸãã¯lbfgs_debug.libããã£ã¬ã¯ããªã«ã³ããŒããŸããæ¬¡ã«ããœãªã¥ãŒã·ã§ã³ãã¡ã€ã«ïŒcrfsuite.slnïŒãéããŠãã«ãããŸãã
Linuxç°å¢ã§ã¯ãlibLBFGSã®ãœãŒã¹ããã±ãŒãžãããŠã³ããŒãããŠãã«ãããŸããã䜿çšã®ãªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã«libLBFGSãã€ã³ã¹ããŒã«ããªãå Žåã¯ãconfigureã¹ã¯ãªããã« " - prefix"ãªãã·ã§ã³ãæå®ããŠãã ããããã®äŸã§ã¯ãããŒã ãã£ã¬ã¯ããªïŒ$ HOMEïŒã®äžã®localãã£ã¬ã¯ããªã«libLBFGSãã€ã³ã¹ããŒã«ããŸãã
$ ./configure --prefix=$HOME/local
$ make
$ make install
CRFsuiteãäœæããæºåãæŽããŸããã libLFGSãå¥ã®ãã£ã¬ã¯ããªã«ã€ã³ã¹ããŒã«ããŠããå Žåã¯ã "--with-liblbfgs"ãªãã·ã§ã³ã®åŒæ°ã«ãã£ã¬ã¯ããªãæå®ããŠãã ããã
$ ./configure --prefix=$HOME/local --with-liblbfgs=$HOME/local
$ make
$ make install
CRFsuiteãŠãŒãã£ãªãã£ã¯ãæåã®ã³ãã³ãã©ã€ã³åŒæ°ãã³ãã³ãåã§ããããšãæ³å®ããŠããŸãã
- åŠã¶
- ãã¬ãŒãã³ã°ã»ããããCRFã¢ãã«ããã¬ãŒãã³ã°ããã
- ã¿ã°
- CRFã¢ãã«ãçšããŠã¿ã°é åãã¿ã°ããã
- ãã³ã
- CRFã¢ãã«ããã¬ãŒã³ããã¹ã圢åŒã§ãã³ãããŸãã
ã³ãã³ãã©ã€ã³æ§æã衚瀺ããã«ã¯ã-hïŒ--helpïŒãªãã·ã§ã³ã䜿çšããŸãã
$ crfsuite -h
CRFSuite 0.12 Copyright (c) 2007-2011 Naoaki Okazaki
USAGE: crfsuite <COMMAND> [OPTIONS]
COMMAND Command name to specify the processing
OPTIONS Arguments for the command (optional; command-specific)
COMMAND:
learn Obtain a model from a training set of instances
tag Assign suitable labels to given instances by using a model
dump Output a model in a plain-text format
For the usage of each command, specify -h option in the command argument.
ãã¬ãŒãã³ã°ã»ããããCRFã¢ãã«ããã¬ãŒãã³ã°ããã«ã¯ã次ã®ã³ãã³ããå ¥åããŸãã
$ crfsuite learn [OPTIONS] [DATA]
åŒæ°DATAãçç¥ãããå ŽåããŸã㯠' - 'ã®å Žåããã®ãŠãŒãã£ãªãã£ã¯STDINãããã¬ãŒãã³ã°ããŒã¿ãèªã¿èŸŒã¿ãŸãã learnã³ãã³ãã®äœ¿çšæ³ã衚瀺ããã«ã¯ã-hïŒ--helpïŒãªãã·ã§ã³ãæå®ããŸãã
$ crfsuite learn -h
CRFSuite 0.12 Copyright (c) 2007-2011 Naoaki Okazaki
USAGE: crfsuite learn [OPTIONS] [DATA1] [DATA2] ...
Trains a model using training data set(s).
DATA file(s) corresponding to data set(s) for training; if multiple N files
are specified, this utility assigns a group number (1...N) to the
instances in each file; if a file name is '-', the utility reads a
data set from STDIN
OPTIONS:
-t, --type=TYPE specify a graphical model (DEFAULT='1d'):
(this option is reserved for the future use)
1d 1st-order Markov CRF with state and transition
features; transition features are not conditioned
on observations
-a, --algorithm=NAME specify a training algorithm (DEFAULT='lbfgs')
lbfgs L-BFGS with L1/L2 regularization
l2sgd SGD with L2-regularization
ap Averaged Perceptron
pa Passive Aggressive
arow Adaptive Regularization of Weights (AROW)
-p, --set=NAME=VALUE set the algorithm-specific parameter NAME to VALUE;
use '-H' or '--help-parameters' with the algorithm name
specified by '-a' or '--algorithm' and the graphical
model specified by '-t' or '--type' to see the list of
algorithm-specific parameters
-m, --model=FILE store the model to FILE (DEFAULT=''); if the value is
empty, this utility does not store the model
-g, --split=N split the instances into N groups; this option is
useful for holdout evaluation and cross validation
-e, --holdout=M use the M-th data for holdout evaluation and the rest
for training
-x, --cross-validate repeat holdout evaluations for #i in {1, ..., N} groups
(N-fold cross validation)
-l, --log-to-file write the training log to a file instead of to STDOUT;
The filename is determined automatically by the training
algorithm, parameters, and source files
-L, --logbase=BASE set the base name for a log file (used with -l option)
-h, --help show the usage of this command and exit
-H, --help-parameters show the help message of algorithm-specific parameters;
specify an algorithm with '-a' or '--algorithm' option,
and specify a graphical model with '-t' or '--type' option
ãã¬ãŒãã³ã°ã«ã¯ä»¥äžã®ãªãã·ã§ã³ããããŸãã
-tã--type=TYPEãã£ãŒãã£çæã«äœ¿çšããã°ã©ãã£ã«ã«ã¢ãã«ãæå®ããŸãã ããã©ã«ãå€ã¯ "1d"ã§ãã1dç¶æ ãšé·ç§»ã®ç¹åŸŽãæã€1次ãã«ã³ãCRFïŒãã€ã¢ãæ©èœïŒã ç¶æ ã®ç¹åŸŽã¯å±æ§ãšã©ãã«ã®çµã¿åãããæ¡ä»¶ãšããé·ç§»ç¹åŸŽã¯ã©ãã«ã®ãã€ã°ã©ã ã«æ¡ä»¶ä»ããããŸãã
-aã--algorithm=NAMEãã¬ãŒãã³ã°ã¢ã«ãŽãªãºã ãæå®ããŸãã ããã©ã«ãå€ã¯ "lbfgs"ã§ããlbfgsL-BFGSæ³ã«ããåŸé éäžl2sgdL2æ£èŠåé ã䌎ã確ççåŸé éäžapå¹³åããŒã»ãããã³PAããã·ãã¢ã°ã¬ãã·ãïŒPAïŒarowéã¿ãã¯ãã«ïŒAROWïŒã®é©å¿æ£èŠå
-pã--param=NAME=VALUEãã¬ãŒãã³ã°ã®ãã©ã¡ãŒã¿ãèšå®ããŸãã CRFsuiteã¯ããã©ã¡ãŒã¿ïŒNAMEïŒãVALUEã«èšå®ããŸããå©çšå¯èœãªãã©ã¡ãŒã¿ã¯ãéžæãããã°ã©ãã£ã«ã«ã¢ãã«ããã³ãã¬ãŒãã³ã°ã¢ã«ãŽãªãºã ã«äŸåããã䜿çšå¯èœãªãã©ã¡ãŒã¿ã®ãã«ãã¡ãã»ãŒãžã衚瀺ããã«ã¯ã '-a'ãŸã㯠'--algorithm'ã§æå®ãããã¢ã«ãŽãªãºã åãš '-t'ãŸã㯠'--algorithm'ã§æå®ãããã°ã©ãã£ã«ã«ã¢ãã«ã§ '-H'ãŸã㯠'--help- - ã¿ã€ã'ã-mã--model=MODELèšç·Žãããã¢ãã«ãMODELãã¡ã€ã«ã«æ ŒçŽããŸããããã©ã«ãå€ã¯ ""ïŒç©ºïŒã§ãã MODELã空ã®å ŽåãCRFsuiteã¯ã¢ãã«ããã¡ã€ã«ã«ä¿åããŸããã-gã--split=Nã€ã³ã¹ã¿ã³ã¹ãNåã®ã°ã«ãŒãã«åå²ãã{1ã...ãN}ã®çªå·ãåã°ã«ãŒãã«å²ãåœãŠãŸãããã®ãªãã·ã§ã³ã¯äž»ã«Nåã®ã¯ãã¹ããªããŒã·ã§ã³ïŒ-xãªãã·ã§ã³ä»ãïŒãå®è¡ããããã«äœ¿çšãããŸããããã©ã«ãã§ã¯ãCRFsuiteã¯å ¥åããŒã¿ãã°ã«ãŒãã«åå²ããŸããã-eã--holdout=Mä¿çè©äŸ¡ã«ã¯ã°ã«ãŒãçªå·Mã®ã€ã³ã¹ã¿ã³ã¹ã䜿çšããŸãã CRFsuiteã¯ãã°ã«ãŒãçªå·Mã®ã€ã³ã¹ã¿ã³ã¹ããã¬ãŒãã³ã°ã«äœ¿çšããŸãããããã©ã«ãã§ã¯ãCRFsuiteã¯ä¿çè©äŸ¡ãå®è¡ããŸããã-xã--cross-validateNå亀差æ€èšŒãå®è¡ããŸãã -gãªãã·ã§ã³ã䜿çšããŠå岿°ãæå®ããŸããããã©ã«ãã§ã¯ãCRFsuiteã¯ã¯ãã¹ããªããŒã·ã§ã³ãå®è¡ããŸããã-lã--log-to-fileãã¬ãŒãã³ã°ã®ãã°ã¡ãã»ãŒãžããã¡ã€ã«ã«æžãåºããŸãããã¡ã€ã«åã¯ãã³ãã³ãã©ã€ã³åŒæ°ïŒãã¬ãŒãã³ã°ã¢ã«ãŽãªãºã ãã°ã©ãã£ã«ã«ã¢ãã«ããã©ã¡ãŒã¿ããœãŒã¹ãã¡ã€ã«ãªã©ïŒããèªåçã«æ±ºå®ãããŸããããã©ã«ãã§ã¯ãCRFsuiteã¯ãã°ã¡ãã»ãŒãžãSTDOUTã«æžã蟌ã¿ãŸãã-Lã--logbase=BASEãã°ãã¡ã€ã«ã®ããŒã¹åãæå®ããŸãïŒ-lãªãã·ã§ã³ãšãšãã«äœ¿çšããŸãïŒã ããã©ã«ãã§ã¯ãããŒã¹å㯠"log.crfsuite"ã§ãã-hã--helpãã®ã³ãã³ãã®äœ¿çšæ³ã衚瀺ããŠçµäºããŸãã-Hã--help-parametersãã©ã¡ãŒã¿ãšãã®èª¬æã®ãªã¹ãã衚瀺ããŸãã -tããã³-aãªãã·ã§ã³ã䜿çšããŠãã°ã©ãã£ã«ã«ã¢ãã«ãšãã¬ãŒãã³ã°ã¢ã«ãŽãªãºã ãããããæå®ããŸãã-pã--param=NAME=VALUEãã¬ãŒãã³ã°ã®ãã©ã¡ãŒã¿ãèšå®ããŸãã CRFsuiteã¯ããã©ã¡ãŒã¿ïŒNAMEïŒãVALUEã«èšå®ããŸãã ãã©ã¡ãŒã¿ãšãã®èª¬æã®ãªã¹ãã衚瀺ããã«ã¯ã-HïŒ--help-parametersïŒãªãã·ã§ã³ã䜿çšããŸãã
ãã¬ãŒãã³ã°ã®ããã®CRFsuiteã³ãã³ãã©ã€ã³ã®ããã€ãã®äŸã以äžã«ç€ºããŸãã
train.txtã®CRFã¢ãã«ãããã©ã«ãã®ãã©ã¡ãŒã¿ã§ãã¬ãŒãã³ã°ããã¢ãã«ãCRF.modelã«ä¿åããŸãã
$ crfsuite learn -m CRF.model train.txt
STDINã®CRFã¢ãã«ãããã©ã«ãã®ãã©ã¡ãŒã¿ã§ãã¬ãŒãã³ã°ããŸãã
$ cat train.txt | crfsuite learn -
train.txtïŒã°ã«ãŒãïŒ1ïŒããCRFã¢ãã«ããã¬ãŒãã³ã°ããŸãã èšç·Žäžã«ãããŒã«ãã¢ãŠãããŒã¿test.txtïŒã°ã«ãŒãïŒ2ïŒã§ã¢ãã«ããã¹ãããŸãã
$ crfsuite learn -e2 train.txt test.txt
ãã¬ãŒãã³ã°ããŒã¿train.txtã§10åã®ã¯ãã¹ããªããŒã·ã§ã³ãå®è¡ããŸãã ãã°åºåã¯log.crfsuite_lbfgsã«æ ŒçŽãããŸãïŒãã°ãã¡ã€ã«ã®ååã¯ããã¬ãŒãã³ã°ãã©ã¡ãŒã¿ã«ãã£ãŠç°ãªãå ŽåããããŸãïŒã
$ crfsuite learn -g10 -x -l train.txt
ç¶æ ãšé·ç§»ã®ç¹åŸŽãæã€1次ãã«ã³ãCRFïŒãã€ã¢ãæ©èœïŒãç¶æ ã®ç¹åŸŽã¯å±æ§ãšã©ãã«ã®çµã¿åãããæ¡ä»¶ãšããé·ç§»ç¹åŸŽã¯ã©ãã«ã®ãã€ã°ã©ã ã«æ¡ä»¶ä»ããããŸãã
feature.minfreq=VALUEãã£ãŒãã£ã®çºçé »åºŠã®ã«ãããªããããå€ã CRFsuiteã¯ãèšç·ŽããŒã¿äžã®åºçŸé »åºŠãVALUEãã倧ãããªãç¹åŸŽãç¡èŠãããããã©ã«ãå€ã¯0ïŒã€ãŸããã«ãããªããªãïŒã§ããfeature.possible_states=BOOLCRFsuiteãèšç·ŽããŒã¿å ã«ååšããªãç¶æ ç¹åŸŽïŒããªãã¡ãè² ã®ç¶æ ã®ç¹åŸŽïŒãçæãããã©ãããæå®ããã BOOLã1ã«èšå®ãããšãCRFsuiteã¯å±æ§ãšã©ãã«ã®éã«èãããããã¹ãŠã®çµã¿åãããé¢é£ä»ããç¶æ æ©èœãçæããŸãã屿§ãšã©ãã«ã®æ°ãããããAãšLãšãããšããã®é¢æ°ã¯ïŒA * LïŒåã®ç¹åŸŽãçæããŸãããã®æ©èœãæå¹ã«ãããšãCRFã¢ãã«ã§é ç®ãåç §ã©ãã«ã«äºæž¬ãããªãç¶æ ãç¥ãããšãã§ãããããã©ãã«ä»ãã®ç²ŸåºŠãåäžããå¯èœæ§ããããŸãããããããã®æ©èœã¯ããã£ãŒãã£ã®æ°ãå¢ããããã¬ãŒãã³ã°ããã»ã¹ãå€§å¹ ã«é ãããå¯èœæ§ããããŸãããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸããfeature.possible_transitions=BOOLCRFsuiteãèšç·ŽããŒã¿å ã«ãããååšããªãé·ç§»ç¹åŸŽïŒããªãã¡ãè² ã®é·ç§»ç¹åŸŽïŒãçæãããã©ãããæå®ããã BOOLã1ã«èšå®ãããšãCRFsuiteã¯ãã¹ãŠã®å¯èœãªã©ãã«ãã¢ãé¢é£ä»ããé·ç§»æ©èœãçæããŸããèšç·ŽããŒã¿ã®ã©ãã«ã®æ°ãLã§ãããšãããšããã®é¢æ°ã¯ïŒL * LïŒã®é·ç§»ç¹åŸŽãçæããããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸãã
CRFsuiteã³ãã³ãã©ã€ã³ã®äŸãããã€ã玹ä»ããŸãã
2åæªæºã®æ©èœã¯ãã¬ãŒãã³ã°ã«äœ¿çšãããŸããã
$ crfsuite learn -m CRF.model -p feature.minfreq = 2 train.txt
è² ã®ç¶æ ãšé·ç§»ã®ãã£ãŒãã£ïŒå¥åãå¯ãªãã£ãŒãã£ã»ããïŒãçæããŸãã
$ crfsuite learn -m CRF.model -p feature.possible_states=1 -p feature.possible_transitions=1 train.txt
å¶éãããèšæ¶Broyden-Fletcher-Goldfarb-ShannoïŒL-BFGSïŒæ³ãçšããŠL1ããã³/ãŸãã¯L2æ£èŠåé ãçšããŠèšç·ŽããŒã¿ã®å°€åºŠã®å¯Ÿæ°ãæå€§åããã L1æ£ååé ã®éãŒãä¿æ°ãæå®ããããšãã¢ã«ãŽãªãºã ã¯ãæ£å - éå®çã¡ã¢ãª - æºãã¥ãŒãã³ïŒOWL-QNïŒæ³ã«åãæ¿ããã å®éã«ã¯ããã®ã¢ã«ãŽãªãºã ã¯ãã¬ãŒãã³ã°ããã»ã¹ã®éå§æã«ãã£ãŒãã£ãŠã§ã€ããéåžžã«ãã£ãããšæ¹åããŸãããæçµçã«æé©ãªãã£ãŒãã£ãŠã§ã€ãã«ãã°ããåæããŸãã
c1=VALUEL1æ£ååã®ä¿æ°ããŒã以å€ã®å€ãæå®ãããšãCRFsuiteã¯Orthant-Wise Limited-Memory Quasi-NewtonïŒOWL-QNïŒã¡ãœããã«åãæ¿ãããŸããããã©ã«ãå€ã¯ãŒãã§ãïŒL1æ£èŠåãªãïŒãc2=VALUEL2æ£ååã®ä¿æ°ãããã©ã«ãå€ã¯1ã§ããmax_iterations=NUML-BFGSæé©åã®æå€§ååŸ©åæ°ãååŸ©åæ°ããã®å€ãè¶ ãããšãL-BFGSã«ãŒãã³ã¯çµäºããŸããããã©ã«ãå€ã¯ããã·ã³ã®æŽæ°ã®æå€§å€ïŒINT_MAXïŒã«èšå®ãããŠããŸããnum_memories=NUML-BFGSãéããã»è¡åãè¿äŒŒããããã«äœ¿çšããå¶éãããã¡ã¢ãªã®æ°ãããã©ã«ãå€ã¯6ã§ããepsilon=VALUEã³ã³ããŒãžã§ã³ã¹ã®æ¡ä»¶ã決å®ããã€ãã·ãã³ãã©ã¡ãŒã¿ãããã©ã«ãå€ã¯1e-5ã§ããstop=NUMåæ¢åºæºããã¹ãããããã®å埩ã®ç¶ç¶æéãããã©ã«ãå€ã¯10ã§ããdelta=VALUEåæ¢åºæºã®ãããå€ã L-BFGSå埩ã¯ãæåŸã®$ {stop}å埩ã«å¯Ÿãã察æ°å°€åºŠã®æ¹åããã®éŸå€ä»¥äžã§ãããšãã«åæ¢ãããããã©ã«ãå€ã¯1e-5ã§ããlinesearch=STRINGL-BFGSã¢ã«ãŽãªãºã ã§äœ¿çšãããç·æ¢çŽ¢æ³ãå©çšå¯èœãªã¡ãœããã¯ã "MoreThuente"ïŒMoreãšThuenteã«ãã£ãŠææ¡ãããMoreThuenteã¡ãœããïŒã "Backtracking"ïŒéåžžã®Wolfeæ¡ä»¶ã§ã®ããã¯ãã©ããã³ã°ã¡ãœããïŒã "StrongBacktracking"ïŒåŒ·åãªWolfeæ¡ä»¶ã§ã®ããã¯ãã©ããã³ã°ã¡ãœããïŒã§ããããã©ã«ãã®æ¹æ³ã¯ "MoreThuente"ã§ããmax_linesearch=NUMã©ã€ã³æ€çŽ¢ã¢ã«ãŽãªãºã ã®è©Šè¡åæ°ã®æå€§å€ãããã©ã«ãå€ã¯20ã§ãã
L-BFGSãã¬ãŒãã³ã°ã®ã³ãã³ãã©ã€ã³ã®ããã€ãã®äŸã以äžã«ç€ºããŸãã
L2æ£ååïŒc1 = 0ãc2 = 1.0ïŒã§ã¢ãã«ãèšç·Žããã
$ crfsuite learn -m CRF.model -a lbfgs -p c2=1 train.txt
L1æ£ååïŒc1 = 1.0ãc2 = 0ïŒã§ã¢ãã«ãèšç·Žããã
$ crfsuite learn -m CRF.model -a lbfgs -p c1=1 -p c2=0 train.txt
L1ãšL2ã®æ£ååïŒc1 = 1.0ãc2 = 1.0ïŒã§ã¢ãã«ãèšç·Žããã
$ crfsuite learn -m CRF.model -a lbfgs -p c1=1 -p c2=1 train.txt
ããããµã€ãº1ã®ç¢ºçåŸé éäžïŒSGDïŒã䜿çšããŠãL2æ£ååé ãçšããŠèšç·ŽããŒã¿ã®å°€åºŠã®å¯Ÿæ°ãæå€§åããããã®ã¢ã«ãŽãªãºã ã¯éåžžãæé©ãªç¹åŸŽéã¿ã«éåžžã«è¿ éã«è¿ã¥ãããæåŸã«é ãåæã瀺ãã
c2=VALUEL2æ£ååã®ä¿æ°ãããã©ã«ãå€ã¯1ã§ããmax_iterations=NUMSGDæé©åã®æå€§ååŸ©åæ°ïŒãšããã¯ïŒãæé©åã«ãŒãã³ã¯ãç¹°ãè¿ãåæ°ããã®å€ãè¶ ãããšçµäºããŸããããã©ã«ãå€ã¯1000ã§ããperiod=NUMåæ¢åºæºããã¹ãããããã®å埩ã®ç¶ç¶æéãããã©ã«ãå€ã¯10ã§ããdelta=VALUEåæ¢åºæºã®ãããå€ãæåŸã®$ {period}å埩ã§ã®å¯Ÿæ°å°€åºŠã®æ¹åããã®éŸå€ä»¥äžã§ãããšããæé©åããã»ã¹ã¯åæ¢ãããããã©ã«ãå€ã¯1e-5ã§ããcalibration.eta=VALUEæ ¡æ£ã«äœ¿çšãããåŠç¿çïŒÎ·ïŒã®åæå€ãããã©ã«ãå€ã¯0.1ã§ããcalibration.rate=VALUEèŒæ£ã®ããã®åŠç¿çã®å¢æžçãããã©ã«ãå€ã¯2ã§ããcalibration.samples=NUMèŒæ£ã«äœ¿çšãããã€ã³ã¹ã¿ã³ã¹ã®æ°ãèŒæ£ã«ãŒãã³ã¯ãVALUEãã倧ãããªãã€ã³ã¹ã¿ã³ã¹ãã©ã³ãã ã«éžæãããããã©ã«ãå€ã¯1000ã§ããcalibration.candidates=NUMåŠç¿çã®åè£è ã®æ°ãèŒæ£ã«ãŒãã³ã¯ã察æ°å°€åºŠãé«ããããšãã§ããåŠç¿çã®åè£NUMãèŠã€ããåŸã«çµäºãããããã©ã«ãå€ã¯10ã§ããcalibration.max_trials=NUMæ ¡æ£ã®åŠç¿çã®æå€§è©Šè¡åæ°ãèŒæ£ã«ãŒãã³ã¯ãåŠç¿çã®åè£å€NUMã詊ããåŸã«çµäºãããããã©ã«ãå€ã¯20ã§ãã
次ã«ãSGDãã¬ãŒãã³ã°ã®ã³ãã³ãã©ã€ã³ã®äŸã瀺ããŸãã
L2æ£ååïŒc2=1.0ïŒã§ã¢ãã«ãèšç·Žããã
$ crfsuite learn -m CRF.model -a l2sgd -p c2=1 train.txt
çŸåšã®ã¢ãã«ãã©ã¡ãŒã¿ãã¢ã€ãã ã·ãŒã±ã³ã¹ãæ£ããäºæž¬ã§ããªãå Žåããã®ã¢ã«ãŽãªãºã ã¯ããŒã»ãããã³æŽæ°ãã¢ãã«ã«é©çšããŸãããã®ã¢ã«ãŽãªãºã ã¯ããã¬ãŒãã³ã°ããã»ã¹ã®ãã¹ãŠã®æŽæ°ã§ãã£ãŒãã£ãŠã§ã€ãã®å¹³åããšããã¢ã«ãŽãªãºã ã¯ãã¬ãŒãã³ã°ã®ã¹ããŒãã®ç¹ã§æãé«éã§ããã¢ã«ãŽãªãºã ã¯éåžžã«ç°¡åã§ãããé«ãäºæž¬æ§èœã瀺ããŸããå®éã«ã¯ãååŸ©ã®æå€§åæ°ãæå®ããããšã«ãã£ãŠãã¬ãŒãã³ã°ããã»ã¹ã忢ããå¿ èŠããããŸããååŸ©ã®æå€§åæ°ã¯ãéçºã»ããã§èª¿æŽããå¿ èŠããããŸãã
max_iterations=NUMååŸ©ã®æå€§åæ°ïŒãšããã¯ïŒãæé©åã«ãŒãã³ã¯ãç¹°ãè¿ãåæ°ããã®å€ãè¶ ãããšçµäºããŸããããã©ã«ãå€ã¯100ã§ããepsilon=VALUEã³ã³ããŒãžã§ã³ã¹ã®æ¡ä»¶ã決å®ããã€ãã·ãã³ãã©ã¡ãŒã¿ãã¢ãã«ã«ãã£ãŠäºæž¬ãããäžæ£ç¢ºãªã©ãã«ã®æ¯çãVALUEãã倧ãããªãå Žåãæé©åã«ãŒãã³ã¯çµäºãããããã©ã«ãå€ã¯1e-5ã§ãã
ããã§ã¯ãAveraged Perceptronã®ã³ãã³ãã©ã€ã³ã®äŸã瀺ããŸãã
10åã®å埩ã§ã¢ãã«ãèšç·Žããã
$ crfsuite learn -m CRF.model -a ap -p max_iterations = 10 train.txt
ãã¬ãŒãã³ã°ããŒã¿äžã®ã¢ã€ãã ã·ãŒã±ã³ã¹ïŒxãyïŒãäžããããå Žåãã¢ã«ãŽãªãºã ã¯æå€±ãèšç®ãããããã§ãsïŒxãyïŒ ïŒy 'ïŒã¯ãã¿ãã»ã©ãã«ã»ã·ãŒã±ã³ã¹ã®ã¹ã³ã¢ã§ãããsïŒxãyïŒã¯ãã¬ãŒãã³ã°ã»ããŒã¿ã®ã©ãã«ã»ã·ãŒã±ã³ã¹ã®ã¹ã³ã¢ã§ãããdïŒy'ãyïŒã¯ãã¿ãã»ã©ãã«ã»ã·ãŒã±ã³ã¹ïŒããã³åç §ã©ãã«é åïŒyïŒãå«ããã¢ã€ãã ã«è² ã§ãªãæå€±ããªãå Žåãã¢ã«ãŽãªãºã ã¯æå€±ã«åºã¥ããŠã¢ãã«ãæŽæ°ããŸãã
type=NUMãã£ãŒãã£ãŠã§ã€ããæŽæ°ããããã®æŠç¥ïŒã¹ã©ãã¯å€æ°ãªãã®PAïŒ0ïŒãPAã¿ã€ãIïŒ1ïŒããŸãã¯PAã¿ã€ãIIïŒ2ïŒãããã©ã«ãå€ã¯1ã§ããc=VALUEã¢ã°ã¬ãã·ãæ§ãã©ã¡ãŒã¿ïŒPA-Iããã³PA-IIã«ã®ã¿äœ¿çšãããŸãïŒããã®ãã©ã¡ãŒã¿ã¯ç®ç颿°ãžã®ã¹ã©ãã¯é ã®åœ±é¿ãå¶åŸ¡ããŸããããã©ã«ãå€ã¯1ã§ããerror_sensitive=BOOLãã®ãã©ã¡ãŒã¿ãçïŒéãŒãïŒã§ããå Žåãæé©åã«ãŒãã³ã¯ç®ç颿°ã«ãã¢ãã«ã«ãã£ãŠäºæž¬ãããäžæ£ç¢ºãªã©ãã«ã®æ°ã®å¹³æ¹æ ¹ãå«ããããã©ã«ãå€ã¯1ïŒçââïŒã§ããaveraging=BOOLãã®ãã©ã¡ãŒã¿ãçïŒéãŒãïŒã§ããå Žåãæé©åã«ãŒãã³ã¯ããã¬ãŒãã³ã°ããã»ã¹ã«ããããã¹ãŠã®æŽæ°ã«ãããç¹åŸŽéã¿ã®å¹³åãèšç®ããïŒAveraged Perceptronãšåæ§ïŒãããã©ã«ãå€ã¯1ïŒçââïŒã§ããmax_iterations=NUMååŸ©ã®æå€§åæ°ïŒãšããã¯ïŒãæé©åã«ãŒãã³ã¯ãç¹°ãè¿ãåæ°ããã®å€ãè¶ ãããšçµäºããŸããããã©ã«ãå€ã¯100ã§ããepsilon=VALUEã³ã³ããŒãžã§ã³ã¹ã®æ¡ä»¶ã決å®ããã€ãã·ãã³ãã©ã¡ãŒã¿ãå¹³åæå€±ãVALUEãã倧ãããªãå Žåãæé©åã«ãŒãã³ã¯çµäºãããããã©ã«ãå€ã¯1e-5ã§ãã
ãã¬ãŒãã³ã°ããŒã¿å ã®ã¢ã€ãã ã·ãŒã±ã³ã¹ïŒxãyïŒãäžããããå Žåãã¢ã«ãŽãªãºã ã¯ãã¹ãèšç®ãããsïŒxãy 'ïŒã¯ãã¿ãã©ãã«ã®ã¹ã³ã¢ã§ãã sïŒxãyïŒã¯ããã¬ãŒãã³ã°ããŒã¿ã®ã©ãã«ã·ãŒã±ã³ã¹ã®ã¹ã³ã¢ã§ããã
variance=VALUEãã¹ãŠã®ç¹åŸŽéã®åæåæ£ã ãã®ã¢ã«ãŽãªãºã ã¯ãå¹³å0ãšåæ£VALUEãæã€å€å€éã¬ãŠã¹ååžãšããŠç¹åŸŽéã®ãã¯ãã«ãåæåããŸãã ããã©ã«ãå€ã¯1ã§ããgamma=VALUEæå€±é¢æ°ãšç¹åŸŽéã®å€åãšã®éã®ãã¬ãŒããªãã ããã©ã«ãå€ã¯1ã§ããmax_iterations=NUMååŸ©ã®æå€§åæ°ïŒãšããã¯ïŒã æé©åã«ãŒãã³ã¯ãç¹°ãè¿ãåæ°ããã®å€ãè¶ ãããšçµäºããŸãã ããã©ã«ãå€ã¯100ã§ããepsilon=VALUEã³ã³ããŒãžã§ã³ã¹ã®æ¡ä»¶ã決å®ããã€ãã·ãã³ãã©ã¡ãŒã¿ã å¹³åæå€±ãVALUEãã倧ãããªãå Žåãæé©åã«ãŒãã³ã¯çµäºããã ããã©ã«ãå€ã¯1e-5ã§ãã
CRFã¢ãã«ã䜿çšããŠããŒã¿ã«ã¿ã°ãä»ããã«ã¯ã次ã®ã³ãã³ããå ¥åããŸãã
$ crfsuite tag [OPTIONS] [DATA]
åŒæ°DATAãçç¥ãããå ŽåããŸã㯠' - 'ã®å ŽåãCRFsuiteã¯STDINããããŒã¿ãèªã¿åããŸããã¿ã°ã³ãã³ãã®äœ¿çšæ³ã衚瀺ããã«ã¯ã-hïŒ--helpïŒãªãã·ã§ã³ãæå®ããŸãã
$ crfsuite tag -h
CRFSuite 0.12 Copyright (c) 2007-2011 Naoaki Okazaki
USAGE: crfsuite tag [OPTIONS] [DATA]
Assign suitable labels to the instances in the data set given by a file (DATA).
If the argument DATA is omitted or '-', this utility reads a data from STDIN.
Evaluate the performance of the model on labeled instances (with -t option).
OPTIONS:
-m, --model=MODEL Read a model from a file (MODEL)
-t, --test Report the performance of the model on the data
-r, --reference Output the reference labels in the input data
-p, --probability Output the probability of the label sequences
-i, --marginal Output the marginal probabilities of items
-q, --quiet Suppress tagging results (useful for test mode)
-h, --help Show the usage of this command and exit
ã¿ã°ä»ãã«ã¯æ¬¡ã®ãªãã·ã§ã³ããããŸãã
-mã--model=MODELCRFsuiteãCRFã¢ãã«ãèªã¿èŸŒããã¡ã€ã«åã-tã--testå ¥åããŒã¿ã«ã©ãã«ãä»ããããŠãããšä»®å®ããŠãCRFã¢ãã«ã®ããã©ãŒãã³ã¹ïŒç²ŸåºŠã粟床ããªã³ãŒã«ãf1尺床ïŒãè©äŸ¡ããŸãããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸãã-rã--referenceå ¥åã©ãã«ãã©ãã«ä»ããããŠãããšä»®å®ããŠãäºæž¬ã©ãã«ãšäžŠåã«åç §ã©ãã«ãåºåããŸãããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸãã-pã--probabilityã¢ãã«ã«ãã£ãŠäºæž¬ãããã©ãã«é åã®ç¢ºçãåºåããããã®æ©èœãæå¹ã«ãããšãã©ãã«ã·ãŒã±ã³ã¹ã¯ "@probability \ tx.xxxx"ãšããè¡ã§å§ãŸããŸãã "x.xxxx"ã¯ã·ãŒã±ã³ã¹ã®ç¢ºçã衚ãã "\ t"ã¯TABæåã衚ããŸãããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸãã-iã--marginalã©ãã«ã®éç確çãåºåããããã®æ©èœãæå¹ã«ãããšãäºæž¬ãããåã©ãã«ã®åŸãã«ãïŒx.xxxxããç¶ããŸãããx.xxxxãã¯ã©ãã«ã®ç¢ºçã衚ããŸãããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸãã-qã--quietã¿ã°ä»ãã©ãã«ã®åºåãæå¶ããŸãããã®é¢æ°ã¯ã-tãªãã·ã§ã³ã䜿çšããŠCRFã¢ãã«ãè©äŸ¡ããå Žåã«äŸ¿å©ã§ãã-hã--helpãã®ã³ãã³ãã®äœ¿çšæ³ã衚瀺ããŠçµäºããŸãã
ã¿ã°ä»ãã®ããã®CRFsuiteã³ãã³ãã©ã€ã³ã®ããã€ãã®äŸã以äžã«ç€ºããŸãã
CRFã¢ãã«CRF.modelã䜿çšããŠããŒã¿test.txtã«ã¿ã°ãä»ãã
$ crfsuite tag -m CRF.model test.txt
ã©ããªã³ã°ãããããŒã¿test.txtäžã®CRFã¢ãã«CRF.modelãè©äŸ¡ããã
$ crfsuite tag -m CRF.model -qt test.txt
CRFã¢ãã«ããã¬ãŒã³ããã¹ã圢åŒã§ãã³ãããã«ã¯ã次ã®ã³ãã³ããå ¥åããŸãã
$ crfsuite dump <MODEL>