我可以将调用添加到属性文件中:
java -cp stanford-corenlp-3.5.2.jar -mx1g edu.stanford.nlp.ie.NERClassifierCombiner -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -map word=0,answer=1 -props c2is2nlp.props -textFile c2is2r3.txt
这是错误堆栈:
NERClassifierCombiner invoked on Mon Jul 20 13:08:20 EDT 2015 with arguments:
-loadClassifier c2is2.serialized.ncc.ncc.ser.gz -map word=0,answer=1 -props c2is2nlp.props -textFile c2is2r3.txt
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
regexner.mapping=c2is2Mapping.tab
Unknown property: |regexner.mapping|
textFile=c2is2r3.txt
map=word=0,answer=1
annotators=tokenize, ssplit, pos, lemma, ner, parse, dcoref
Unknown property: |annotators|
map=word=0,answer=1
我是否需要添加以下内容:
edu.stanford.nlp.pipeline.StanfordCoreNLP [ -props <YOUR CONFIGURATION FILE> ]
下面:
java -cp stanford-corenlp-3.5.2.jar -mx1g edu.stanford.nlp.ie.NERClassifierCombiner -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -map word=0,answer=1 -textFile c2is2r3.txt
以下是工作步骤:
令牌文件
more c2is2r3.tsv
The O
fate O
of O
Lehman ORGANIZATION
Brothers ORGANIZATION
. . .
New ORGANIZATION
York ORGANIZATION
Fed ORGANIZATION
, O
and O
Treasury TITLE
Secretary TITLE
Henry PERSON
M. PERSON
Paulson PERSON
Jr. PERSON
. O
属性文件:
more c2is2r3.prop
trainFile = c2is2r3.tsv
serializeTo = c2is2r3-ner-model.ser.gz
map = word=0,answer=1
useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true
这是自定义分类器:
java -cp stanford-corenlp-3.5.2.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop c2is2r3.prop
组合模型
java -cp stanford-corenlp-3.5.2.jar -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model c2is2r3-ner-model.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz -ner.useSUTime false -ner.combinationMode HIGH_RECALL -serializeTo c2is2.serialized.ncc.ncc.ser.gz
测试
java -cp stanford-corenlp-3.5.2.jar -mx1g edu.stanford.nlp.ie.NERClassifierCombiner -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -map word=0,answer=1 -textFile c2is2r3.txt