Question

我在项目中使用了Google NLP。当我使用google NLP提取术语时，Google NLP将返回单个/复合术语作为响应。

例如：当我发送“对特定涵洞装置的液压系统进行完整的理论分析时，既费时又复杂。”文本发送到Google NLP API，然后返回“分析”，“液压”，“涵洞安装”等字词。

现在，我正在尝试使用OpenNLP / CoreNLP库执行此操作。我已经在tokenizer类中尝试过了，在该类中我只有一个单词，但是我也需要两个/更多单词。在上述示例中，我得到了“涵洞安装”一词，它是一个两个单词的词。

  public static void main(String[] args) {

    // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // read some text in the text variable
    String text = "A complete theoretical analysis of the hydraulics of a particular culvert installation is time-consuming and complex.";

    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);

    // run all Annotators on this text
    pipeline.annotate(document);

    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

    for (CoreMap sentence : sentences) {
        // traversing the words in the current sentence
        // a CoreLabel is a CoreMap with additional token-specific methods
        for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
            // this is the text of the token
            String word = token.get(CoreAnnotations.TextAnnotation.class);
            // this is the POS tag of the token
            String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
            // this is the NER label of the token
            String ne = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);

            String  lemma =token.getString(CoreAnnotations.LemmaAnnotation.class);

            System.out.println(String.format("Print: word: [%s] pos: [%s] ne: [%s] lemma: [%s]", word, pos, ne, lemma));

        String ner=token.get(NamedEntityTagAnnotation.class);
        System.out.println("##### "+ner);
        }
    }
}

我如何在Standford CoreNLP / Apache Open NLP中获得复合词

0 个答案: