Question

我正在尝试训练斯坦福NER分类器来识别文本数据库中的特定内容。我已经制作了一个新的.prop文件和一个训练文件，我得到了结果，但它们是我的默认结果如果我在没有经过培训的情况下运行分类器，我会得到我能做些什么来适应这个？

这是我的代码：

import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.StringUtils;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Properties;public class NLP_train {


   public static void main(String[] args) throws IOException {

       Properties props = StringUtils.propFileToProperties("C:/Users/Admin/Desktop/trainingfile.prop");

       StanfordCoreNLP pipeline = new StanfordCoreNLP(props);


       // read some text in the text variable
       File inputFile = new File("C:/Users/Admin/Desktop/target.txt");
       // create an empty Annotation just with the given text
       Annotation document = new Annotation(IOUtils.slurpFileNoExceptions(inputFile));

       // run all Annotators on this text
       pipeline.annotate(document);

       List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

       for (CoreMap sentence : sentences) {
           // traversing the words in the current sentence
           // a CoreLabel is a CoreMap with additional token-specific methods
           for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
               // this is the text of the token
               String word = token.get(CoreAnnotations.TextAnnotation.class);
               // this is the POS tag of the token
               String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
               // this is the NER label of the token
               String ne = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);






               System.out.println(String.format("Print: word: [%s] pos: [%s] ne: [%s]", word, pos, ne));
           }
       }
   }
}

这是我的.prop文件：

trainFile = C：/Users/Admin/Desktop/trainingfile.tsv

serializeTo = C：/Users/Admin/Desktop/ner-model.ser.gz

map = word = 0，answer = 1

useClassFeature =真

useWord =真

useNGrams =真

noMidNGrams =真

useDisjunctive =真

maxNGramLeng = 6

usePrev =真

useNext =真

useSequences =真

usePrevSequences =真

maxLeft = 1

接下来的4个单词形状特征处理

useTypeSeqs =真

useTypeSeqs2 =真

useTypeySequences =真

wordShape = chris2useLC

我的培训档案的摘录：

0

输入雷达

347G雷达

``0

Rice 0

碗0

＆＃39;＆＃39; 0

Answer 1

要训练新的NER模型，您需要使用edu.stanford.nlp.ie.crf.CRFClassifier类直接训练它。您无法在CoreNLP中训练新模型。此外，虽然两者都使用属性文件，但文件的不同之处在于，NER运行的属性文件直接为CRFClassifier类提供属性，而CoreNLP属性文件可以为各种事物提供属性。因此，属性名称被放置在它们自己的名称空间中，因此NER使用的属性将具有如下名称：ner.model。

因此，您需要做的是首先使用CRFClassifier训练一个新的NER模型，大致使用您显示的数据和属性文件。这将为您提供序列化的NER模型文件。 CRF常见问题解答some instructions。然后，您需要为CoreNLP创建一个属性文件，指定NER运行新模型。例如，如果您的新模型是/Users/manning/ner/brands.crf.ser.gz，那么您可以使用该属性： ner.model = edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz,/Users/manning/ner/brands.crf.ser.gz

斯坦福大学NER不会使用我的培训文件，而是使用它的默认值

1 个答案: