如何为斯坦福关系提取生成自定义训练数据

时间:2017-05-07 07:19:22

标签: stanford-nlp

我已经培训了一个自定义分类器来理解金融领域中的命名实体。我想生成自定义培训数据,如下面的链接所示 http://cogcomp.cs.illinois.edu/Data/ER/conll04.corp

我可以手动标记自定义关系,但是想要使用我的自定义命名实体生成像conll这样的数据格式。

我也尝试过以下方式解析器,但是没有像链接https://nlp.stanford.edu/software/relationExtractor.html#training中提到的Roth和Yih数据那样生成关系训练数据。

java -mx150m -cp" stanford-parser-full-2013-06-20 / *:" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat" penn" edu / stanford / nlp / models / lexparser / englishPCFG.ser.gz stanford-parser-full-2013-06-20 / data / testsent.txt> testsent.tree

java -mx150m -cp" stanford-parser-full-2013-06-20 / *:" edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile testsent.tree -conllx

以下是自定义ner运行的输出与以下python代码分开

'java -mx2g -cp "*" edu.stanford.nlp.ie.NERClassifierCombiner '\
                '-ner.model classifiers\custom-model.ser.gz '\
                'classifiers/english.all.3class.distsim.crf.ser.gz,'\
                'classifiers/english.conll.4class.distsim.crf.ser.gz,'\
                'classifiers/english.muc.7class.distsim.crf.ser.gz ' \
                '-textFile '+ outtxt_sent +  ' -outputFormat inlineXML  > ' + outtxt + '.ner'

output:

<PERSON>Charles Sinclair</PERSON> <DESG>Chairman</DESG> <ORGANIZATION>-LRB- age 68 -RRB- Charles was appointed a</ORGANIZATION> <DESG>non-executive director</DESG> <ORGANIZATION>in</ORGANIZATION>

所以即使我有java代码来测试它,NER也可以独立运行。

以下是关系数据生成的详细代码

Properties props = new Properties();
        props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
        props.setProperty("ner.model", "classifiers/custom-model.ser.gz,classifiers/english.all.3class.distsim.crf.ser.gz,classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz");
        // set up Stanford CoreNLP pipeline
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        // build annotation for a review
        Annotation annotation = new Annotation("Charles Sinclair Chairman -LRB- age 68 -RRB- Charles was appointed a non-executive director");
        pipeline.annotate(annotation);
        int sentNum = 0;

.............. Rest of the code is same as yours

output:
0   PERSON  0   O   NNP/NNP Charles/Sinclair    O   O   O
0   PERSON  1   O   NNP Chairman    O   O   O
0   PERSON  2   O   -LRB-/NN/CD/-RRB-/NNP/VBD/VBN/DT    -LRB-/age/68/-RRB-/Charles/was/appointed/a  O   O   O
0   PERSON  3   O   JJ/NN   non-executive/director  O   O   O

O   3   member_of_board //I will modify the relation once the data generated with proper NER

The Ner tagging is ok now.  
 props.setProperty("ner.model", "classifiers/classifiers/english.all.3class.distsim.crf.ser.gz,classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz,");

自定义NER问题已解决。

0 个答案:

没有答案