Question

在我的代码中，我从第一个分类器获得 Person 识别，而对于我做的第二个分类，我添加了一些要识别或注释为组织但它没有注释人。

我需要从他们两个中获益，我该怎么做？

我正在使用Netbeans，这是代码：

String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz"; String serializedClassifier2 = "/Users/ha/stanford-ner-2014-10-26/classifiers/dept-model.ser.gz"; if (args.length > 0) { serializedClassifier = args[0]; } AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifier(serializedClassifier); AbstractSequenceClassifier<CoreLabel> classifier2 = CRFClassifier.getClassifier(serializedClassifier2); String fileContents = IOUtils.slurpFile("/Users/ha/NetBeansProjects/NERtry/src/nertry/input.txt"); List<List<CoreLabel>> out = classifier.classify(fileContents); List<List<CoreLabel>> out2 = classifier2.classify(fileContents); for (List<CoreLabel> sentence : out) { System.out.print("\nenglish.all.3class.distsim.crf.ser.gz: "); for (CoreLabel word : sentence) { System.out.print(word.word() + '/' + word.get(CoreAnnotations.AnswerAnnotation.class) + ' '); } for (List<CoreLabel> sentence2 : out2) { System.out.print("\ndept-model.ser.gz"); for (CoreLabel word2 : sentence2) { System.out.print(word2.word() + '/' + word2.get(CoreAnnotations.AnswerAnnotation.class) + ' '); } System.out.println(); } }

问题来自我得到的结果：

english.all.3class.distsim.crf.ser.gz: What/O date/O did/O James/PERSON started/O his/O job/O in/O Human/O and/O Finance/O ?/O dept-model.ser.gzWhat/O date/O did/O James/ORGANIZATION started/O his/O job/O in/O Human/ORGANIZATION and/O Finance/ORGANIZATION ?/O

它将名称识别为来自第二个分类器的组织，我需要将其注释为PERSON。有什么帮助吗？

Answer 1

您应该使用的课程是NERClassifierCombiner。它的语义是它在你指定它们时按从左到右的顺序运行分类器（在构造函数中可以赋予它任何数字），并且后面的分类器不能注释与早期分类器的实体标记重叠的实体，但是可以自由添加注释。因此，较早的分类器在简单的偏好排名中是首选。我在下面给出一个完整的代码示例。

（如果您正在训练所有自己的分类器，通常最好将所有实体一起训练，这样他们就可以在分配的类别中相互影响。但是这种简单的偏好排序通常很有效，我们自己使用它。）

import edu.stanford.nlp.ie.NERClassifierCombiner;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreLabel;

import java.io.IOException;
import java.util.List;

public class MultipleNERs {

  public static void main(String[] args) throws IOException {
    String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz";
    String serializedClassifier2 = "classifiers/english.muc.7class.distsim.crf.ser.gz";

    if (args.length > 0) {
      serializedClassifier = args[0];
    }

    NERClassifierCombiner classifier = new NERClassifierCombiner(false, false, 
            serializedClassifier, serializedClassifier2);

    String fileContents = IOUtils.slurpFile("input.txt");
    List<List<CoreLabel>> out = classifier.classify(fileContents);

    int i = 0;
    for (List<CoreLabel> lcl : out) {
      i++;
      int j = 0;
      for (CoreLabel cl : lcl) {
        j++;
        System.out.printf("%d:%d: %s%n", i, j,
                cl.toShorterString("Text", "CharacterOffsetBegin", "CharacterOffsetEnd", "NamedEntityTag"));
      }
    }
  }

}

Answer 2

我不太确定这里的问题是什么。您已经有两个分类器的输出。也许这更像是一个Java问题，即如何迭代over both sentences at the same time:

Iterator<List<CoreLabel>> it1 = out1.iterator();
Iterator<List<CoreLabel>> it2 = out2.iterator();
while(it1.hasNext() && it2.hasNext()) {
   List<CoreLabel> sentence1 = it1.next();
   List<CoreLabel> sentence2 = it1.next();
   Iterator<CoreLabel> sentence1It = sentence1.iterator();
   Iterator<CoreLabel> sentence2It = sentence2.iterator();
   while(sentence1It.hasNext() && sentence2It.hasNext()) {
       CoreLabel word1 = sentence1It.next();
       CoreLabel word2 = sentence2It.next();
       System.out.print("\nenglish.all.3class.distsim.crf.ser.gz: ");
       System.out.print(word1.word() + '/' +
         word1.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
       System.out.print("\ndept-model.ser.gz");
       System.out.print(word2.word() + '/' + 
         word2.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
   }
   System.out.println();
}

斯坦福NER：我可以在代码中一次使用两个分类器吗？

2 个答案: