NLP NER RegexNER我可以使用NERClassifierCombiner包含注释器

时间:2015-07-20 14:13:59

标签: stanford-nlp

我可以将调用添加到属性文件中:

java -cp stanford-corenlp-3.5.2.jar -mx1g edu.stanford.nlp.ie.NERClassifierCombiner -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -map word=0,answer=1 -props c2is2nlp.props -textFile c2is2r3.txt

这是错误堆栈:

NERClassifierCombiner invoked on Mon Jul 20 13:08:20 EDT 2015 with arguments:
   -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -map word=0,answer=1 -props c2is2nlp.props -textFile c2is2r3.txt
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
regexner.mapping=c2is2Mapping.tab
Unknown property: |regexner.mapping|
textFile=c2is2r3.txt
map=word=0,answer=1
annotators=tokenize, ssplit, pos, lemma, ner, parse, dcoref
Unknown property: |annotators|
map=word=0,answer=1

我是否需要添加以下内容:

edu.stanford.nlp.pipeline.StanfordCoreNLP [ -props <YOUR CONFIGURATION FILE> ]

下面:

java -cp stanford-corenlp-3.5.2.jar -mx1g edu.stanford.nlp.ie.NERClassifierCombiner -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -map word=0,answer=1 -textFile c2is2r3.txt 

以下是工作步骤:

令牌文件

more c2is2r3.tsv
The O
fate    O
of  O
Lehman  ORGANIZATION    
Brothers    ORGANIZATION
. . .
New ORGANIZATION
York    ORGANIZATION
Fed ORGANIZATION
,   O
and O
Treasury    TITLE
Secretary   TITLE
Henry   PERSON
M.  PERSON
Paulson PERSON
Jr. PERSON
.   O

属性文件:

more c2is2r3.prop

trainFile = c2is2r3.tsv
serializeTo = c2is2r3-ner-model.ser.gz
map = word=0,answer=1

useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true

这是自定义分类器:

java -cp  stanford-corenlp-3.5.2.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop c2is2r3.prop

组合模型

java -cp stanford-corenlp-3.5.2.jar -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model c2is2r3-ner-model.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz -ner.useSUTime false -ner.combinationMode HIGH_RECALL -serializeTo c2is2.serialized.ncc.ncc.ser.gz

测试

java -cp stanford-corenlp-3.5.2.jar -mx1g edu.stanford.nlp.ie.NERClassifierCombiner -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -map word=0,answer=1 -textFile c2is2r3.txt

0 个答案:

没有答案