Question

是stanford NLP的新生儿，过去几天一直在使用它，现在坚持我的最后一步。当我使用PTBTokenizer时，它将句子分解为单词，但我想要的是它应该将句子分解为NamedEntities或Verbs，以便我可以使用一种依赖树来重建句子，从整个句子直接得出结论也应该也理解来自实体的同一实体。是否可以自定义tokenizer来实现这一目标？你的帮助是适用的。提前谢谢。

try{
     // Properties props = StringUtils.argsToProperties(args);

      Properties props = new Properties();
      props.put("annotators",   "tokenize,ssplit");
      props.setProperty("annotators", "tokenize,ssplit,lemma,pos,parse,ner");
      StanfordCoreNLP pipeline = new StanfordCoreNLP();
      String sentence = "Kapil Puri, original promoter of Sparsh BPO, now owned by Blackstone controlled Intelenet, has decided to sell his residual stake of 12%.";

      Annotation doc = new Annotation(sentence);

      pipeline.annotate(doc);
      RelationExtractorAnnotator r = new RelationExtractorAnnotator(props);
      r.annotate(doc);

      for(CoreMap s: doc.get(CoreAnnotations.SentencesAnnotation.class)){         
        System.out.println("For sentence :=>" + s.get(CoreAnnotations.TextAnnotation.class));
        List<RelationMention> rls  = s.get(RelationMentionsAnnotation.class);
        for(RelationMention rl: rls){
          System.out.println(rl.toString());
        }
      }
    }catch(Exception e){
      e.printStackTrace();
    }

Answer 1

我希望您无法直接实现这一目标，即使我遇到了类似的问题。在这里，我尝试使用以下过程：我最初应用了PTBTokenizer，然后在其上使用了NER和RegexNER，然后使用结果，我开始在连续的NER标签的帮助下合并数据。

如果两个或多个连续的单词具有相同的标记，则我合并为一个单词。

为斯坦福核心NLP创建自定义Tokenizer

1 个答案: