ConllReader(如RothCONLL04Reader)在使用自定义NER和自定义关系读取关系训练数据时抛出异常

时间:2017-05-12 08:23:47

标签: stanford-nlp

继续以下问题。 How to generate custom training data for Stanford relation extraction

感谢StanfordNLPHelp,我可以使用自定义ner生成关系数据,并在其上生成regexner。

I had to run my custom model at the end because otherwise it will misclassify lots of ORGANIZATION PERSON etc. 
Example custom NER classes. 

"DEGREE", "DESG"

关系训练数据的例子。

0   ELECTEDBODY 0   O   NNP/IN/NNP  BOARD/OF/DIRECTORS  O   O   O
0   ORGANIZATION    1   O   NNP Board   O   O   O
0   O   2   O   NNS committees  O   O   O
0   O   3   O   JJ  key O   O   O
0   ORGANIZATION    4   O   NN/NN/NN/NN/NNP/NN  N/Nomination/committee/A/Audit/committee    O   O   O
0   O   5   O   NN  R   O   O   O
0   MISC    6   O   NN  Remuneration    O   O   O
0   O   7   O   NN  committee   O   O   O
0   O   8   O   NNP EFFECTIVE   O   O   O
0   O   9   O   NNP LEADERSHIP  O   O   O
0   O   10  O   CC  AND O   O   O
0   O   11  O   JJ  STRONG  O   O   O
0   O   12  O   NN  GOVERNANCE  O   O   O
0   O   13  O   NNP George  O   O   O
0   O   14  O   NNP Weston  O   O   O
0   DESG    15  O   NNP/NNP Chief/Executive O   O   O
0   O   16  O   -LRB-   -LRB-   O   O   O
0   O   17  O   NN  age O   O   O
0   NUMBER  18  O   CD  52  O   O   O
0   O   19  O   -RRB-   -RRB-   O   O   O
0   PERSON  20  O   NNP George  O   O   O
0   O   21  O   VBD was O   O   O
0   O   22  O   VBN appointed   O   O   O
0   O   23  O   TO  to  O   O   O
0   O   24  O   DT  the O   O   O
0   ELECTEDBODY 25  O   NN  board   O   O   O
0   DATE    26  O   IN/CD   in/1999 O   O   O
0   O   27  O   CC  and O   O   O
0   O   28  O   VBD took    O   O   O
0   O   29  O   RP  up  O   O   O
0   O   30  O   PRP$    his O   O   O
0   O   31  O   JJ  current O   O   O
0   O   32  O   NN  appointment O   O   O
0   O   33  O   IN  as  O   O   O
0   DESG    34  O   NNP/NNP Chief/Executive O   O   O
0   O   35  O   IN  in  O   O   O
0   DATE    36  O   NNP/CD  April/2005  O   O   O
0   O   37  O   .   .   O   O   O

20  34  cur_desg 
20  36  cur_desg_from

我正在尝试训练自定义关系模型并添加了我的自定义关系类。

ex: relation class -> **cur_desg** (current designation) between entities (**PERSON, DESG**)
**Here is the relevant section of my properties file to train the relation classifier.**

datasetReaderClass = com.samrat.nlp.ie.re.CustomConllReader
entityClassifier = com.samrat.nlp.ie.re.CustomConllExtractor
relationResultsPrinters = com.samrat.nlp.ie.re.RelationResultPrinter

serializedTrainingSentencesPath = custom_relation_sentences.ser
serializedEntityExtractorPath = custom_relation_model.ser
serializedRelationExtractorPath = custom-relation-model-pipeline.ser

Code CustomConllReader的相关部分

private String getNormalizedNERTag(String ner) {
        ......
        }  else if(ner.equalsIgnoreCase("degree")) {
            return "DEGREE";
        }
        else if(ner.equalsIgnoreCase("electedbody")) {
            return "ELECTEDBODY";
        }
...............

问题1     (CustomConllReader在读取训练数据时会在以下行引发异常)

Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());

CustomConllReader的相关部分(与RothCONLL04Reader几乎相同)

case 3: // relation
                System.out.println(currentLine);
                String type = pieces.get(2);
                List<ExtractionObject> args = new ArrayList<>();
                EntityMention entity1 = indexToEntityMention.get(pieces.get(0));
                EntityMention entity2 = indexToEntityMention.get(pieces.get(1));
                args.add(entity1);
                args.add(entity2);
                Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());
                // identifier = "relation" + sentenceID + "-" + sentence.getAllRelations().size();
                identifier = RelationMention.makeUniqueId();
                RelationMention relationMention = new RelationMention(identifier,
                        sentence, span, type, null, args);
                AnnotationUtils.addRelationMention(sentence, relationMention);
                break;

异常

    INFO: Reading file: tagged-training-relation-data-conll04.corp
20  34  cur_desg 
20  36  cur_desg_from
0   2   cur_desg
Exception in thread "main" java.io.IOException
    at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:138)
    at com.wipro.nlp.ie.re.CustomConllReader.main(CustomConllReader.java:292)
Caused by: java.lang.NullPointerException
    at com.wipro.nlp.ie.re.CustomConllReader.readSentence(CustomConllReader.java:144)
    at com.wipro.nlp.ie.re.CustomConllReader.read(CustomConllReader.java:55)
    at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:136)
    ... 1 more

解析关系(0 2 cur_desg)

时在句子3上抛出的异常
3   PERSON  0   O   NNP/NNP John/Bason  O   O   O
3   O   1   O   NNP Finance O   O   O
3   ELECTEDBODY 2   O   NNP Director    O   O   O
3   O   3   O   -LRB-   -LRB-   O   O   O
3   O   4   O   NN  age O   O   O
3   NUMBER  5   O   CD  59  O   O   O
3   O   6   O   -RRB-   -RRB-   O   O   O
3   PERSON  7   O   NNP John    O   O   O
3   O   8   O   VBD was O   O   O
3   O   9   O   VBN appointed   O   O   O
3   O   10  O   IN  as  O   O   O
3   O   11  O   NNP Finance O   O   O
3   ELECTEDBODY 12  O   NNP Director    O   O   O
3   O   13  O   IN  in  O   O   O
3   DATE    14  O   NNP/CD  May/1999    O   O   O
3   O   15  O   .   .   O   O   O

0   2   cur_desg
0   14  cur_desg_from

这个问题解决了,我的训练数据之间有额外的换行符,我能够构建自定义关系分类器。 但现在使用该自定义关系分类器时,它无法理解任何自定义NER标记或自定义关系。

下面分别提出问题。 (用于使自定义关系分类器理解自定义的ner标签和新句子中的关系) Custom Relation Classifier does not understand any Custom NER tags and does not find any relations

1 个答案:

答案 0 :(得分:0)

由于两者之间存在额外的换行符而引发异常。 输入标记的训练数据中必须有两个换行符,如下所示。

PERSON  0   O   NNP/NNP John/Bason  O   O   O
3   O   1   O   NNP Finance O   O   O
3   ELECTEDBODY 2   O   NNP Director    O   O   O
3   O   3   O   -LRB-   -LRB-   O   O   O
3   O   4   O   NN  age O   O   O
3   NUMBER  5   O   CD  59  O   O   O
3   O   6   O   -RRB-   -RRB-   O   O   O
3   PERSON  7   O   NNP John    O   O   O
3   O   8   O   VBD was O   O   O
3   O   9   O   VBN appointed   O   O   O
3   O   10  O   IN  as  O   O   O
3   O   11  O   NNP Finance O   O   O
3   ELECTEDBODY 12  O   NNP Director    O   O   O
3   O   13  O   IN  in  O   O   O
3   DATE    14  O   NNP/CD  May/1999    O   O   O
3   O   15  O   .   .   O   O   O

0   2   cur_desg
0   14  cur_desg_from

5   O   0   O   PRP He  O   O   O
5   O   1   O   VBD was O   O   O
5   O   2   O   RB  previously  O   O   O
5   O   3   O   DT  the O   O   O
5   O   4   O   NN  finance O   O   O
5   DESG    5   O   NN  director    O   O   O
5   O   6   O   IN  of  O   O   O
5   ORGANIZATION    7   O   NNP Bunzl   O   O   O
5   O   8   O   NN  plc O   O   O
5   O   9   O   CC  and O   O   O
5   O   10  O   VBZ is  O   O   O