继续以下问题。 How to generate custom training data for Stanford relation extraction
感谢StanfordNLPHelp,我可以使用自定义ner生成关系数据,并在其上生成regexner。
I had to run my custom model at the end because otherwise it will misclassify lots of ORGANIZATION PERSON etc.
Example custom NER classes.
"DEGREE", "DESG"
关系训练数据的例子。
0 ELECTEDBODY 0 O NNP/IN/NNP BOARD/OF/DIRECTORS O O O
0 ORGANIZATION 1 O NNP Board O O O
0 O 2 O NNS committees O O O
0 O 3 O JJ key O O O
0 ORGANIZATION 4 O NN/NN/NN/NN/NNP/NN N/Nomination/committee/A/Audit/committee O O O
0 O 5 O NN R O O O
0 MISC 6 O NN Remuneration O O O
0 O 7 O NN committee O O O
0 O 8 O NNP EFFECTIVE O O O
0 O 9 O NNP LEADERSHIP O O O
0 O 10 O CC AND O O O
0 O 11 O JJ STRONG O O O
0 O 12 O NN GOVERNANCE O O O
0 O 13 O NNP George O O O
0 O 14 O NNP Weston O O O
0 DESG 15 O NNP/NNP Chief/Executive O O O
0 O 16 O -LRB- -LRB- O O O
0 O 17 O NN age O O O
0 NUMBER 18 O CD 52 O O O
0 O 19 O -RRB- -RRB- O O O
0 PERSON 20 O NNP George O O O
0 O 21 O VBD was O O O
0 O 22 O VBN appointed O O O
0 O 23 O TO to O O O
0 O 24 O DT the O O O
0 ELECTEDBODY 25 O NN board O O O
0 DATE 26 O IN/CD in/1999 O O O
0 O 27 O CC and O O O
0 O 28 O VBD took O O O
0 O 29 O RP up O O O
0 O 30 O PRP$ his O O O
0 O 31 O JJ current O O O
0 O 32 O NN appointment O O O
0 O 33 O IN as O O O
0 DESG 34 O NNP/NNP Chief/Executive O O O
0 O 35 O IN in O O O
0 DATE 36 O NNP/CD April/2005 O O O
0 O 37 O . . O O O
20 34 cur_desg
20 36 cur_desg_from
我正在尝试训练自定义关系模型并添加了我的自定义关系类。
ex: relation class -> **cur_desg** (current designation) between entities (**PERSON, DESG**)
**Here is the relevant section of my properties file to train the relation classifier.**
datasetReaderClass = com.samrat.nlp.ie.re.CustomConllReader
entityClassifier = com.samrat.nlp.ie.re.CustomConllExtractor
relationResultsPrinters = com.samrat.nlp.ie.re.RelationResultPrinter
serializedTrainingSentencesPath = custom_relation_sentences.ser
serializedEntityExtractorPath = custom_relation_model.ser
serializedRelationExtractorPath = custom-relation-model-pipeline.ser
Code CustomConllReader的相关部分
private String getNormalizedNERTag(String ner) {
......
} else if(ner.equalsIgnoreCase("degree")) {
return "DEGREE";
}
else if(ner.equalsIgnoreCase("electedbody")) {
return "ELECTEDBODY";
}
...............
问题1 (CustomConllReader在读取训练数据时会在以下行引发异常)
Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());
CustomConllReader的相关部分(与RothCONLL04Reader几乎相同)
case 3: // relation
System.out.println(currentLine);
String type = pieces.get(2);
List<ExtractionObject> args = new ArrayList<>();
EntityMention entity1 = indexToEntityMention.get(pieces.get(0));
EntityMention entity2 = indexToEntityMention.get(pieces.get(1));
args.add(entity1);
args.add(entity2);
Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());
// identifier = "relation" + sentenceID + "-" + sentence.getAllRelations().size();
identifier = RelationMention.makeUniqueId();
RelationMention relationMention = new RelationMention(identifier,
sentence, span, type, null, args);
AnnotationUtils.addRelationMention(sentence, relationMention);
break;
异常
INFO: Reading file: tagged-training-relation-data-conll04.corp
20 34 cur_desg
20 36 cur_desg_from
0 2 cur_desg
Exception in thread "main" java.io.IOException
at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:138)
at com.wipro.nlp.ie.re.CustomConllReader.main(CustomConllReader.java:292)
Caused by: java.lang.NullPointerException
at com.wipro.nlp.ie.re.CustomConllReader.readSentence(CustomConllReader.java:144)
at com.wipro.nlp.ie.re.CustomConllReader.read(CustomConllReader.java:55)
at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:136)
... 1 more
解析关系(0 2 cur_desg)
时在句子3上抛出的异常3 PERSON 0 O NNP/NNP John/Bason O O O
3 O 1 O NNP Finance O O O
3 ELECTEDBODY 2 O NNP Director O O O
3 O 3 O -LRB- -LRB- O O O
3 O 4 O NN age O O O
3 NUMBER 5 O CD 59 O O O
3 O 6 O -RRB- -RRB- O O O
3 PERSON 7 O NNP John O O O
3 O 8 O VBD was O O O
3 O 9 O VBN appointed O O O
3 O 10 O IN as O O O
3 O 11 O NNP Finance O O O
3 ELECTEDBODY 12 O NNP Director O O O
3 O 13 O IN in O O O
3 DATE 14 O NNP/CD May/1999 O O O
3 O 15 O . . O O O
0 2 cur_desg
0 14 cur_desg_from
这个问题解决了,我的训练数据之间有额外的换行符,我能够构建自定义关系分类器。 但现在使用该自定义关系分类器时,它无法理解任何自定义NER标记或自定义关系。
下面分别提出问题。 (用于使自定义关系分类器理解自定义的ner标签和新句子中的关系) Custom Relation Classifier does not understand any Custom NER tags and does not find any relations
答案 0 :(得分:0)
由于两者之间存在额外的换行符而引发异常。 输入标记的训练数据中必须有两个换行符,如下所示。
PERSON 0 O NNP/NNP John/Bason O O O
3 O 1 O NNP Finance O O O
3 ELECTEDBODY 2 O NNP Director O O O
3 O 3 O -LRB- -LRB- O O O
3 O 4 O NN age O O O
3 NUMBER 5 O CD 59 O O O
3 O 6 O -RRB- -RRB- O O O
3 PERSON 7 O NNP John O O O
3 O 8 O VBD was O O O
3 O 9 O VBN appointed O O O
3 O 10 O IN as O O O
3 O 11 O NNP Finance O O O
3 ELECTEDBODY 12 O NNP Director O O O
3 O 13 O IN in O O O
3 DATE 14 O NNP/CD May/1999 O O O
3 O 15 O . . O O O
0 2 cur_desg
0 14 cur_desg_from
5 O 0 O PRP He O O O
5 O 1 O VBD was O O O
5 O 2 O RB previously O O O
5 O 3 O DT the O O O
5 O 4 O NN finance O O O
5 DESG 5 O NN director O O O
5 O 6 O IN of O O O
5 ORGANIZATION 7 O NNP Bunzl O O O
5 O 8 O NN plc O O O
5 O 9 O CC and O O O
5 O 10 O VBZ is O O O