我在项目中使用了Google NLP。当我使用google NLP提取术语时,Google NLP将返回单个/复合术语作为响应。
例如:当我发送“对特定涵洞装置的液压系统进行完整的理论分析时,既费时又复杂。”文本发送到Google NLP API,然后返回“分析”,“液压”,“涵洞安装”等字词。
现在,我正在尝试使用OpenNLP / CoreNLP库执行此操作。我已经在tokenizer类中尝试过了,在该类中我只有一个单词,但是我也需要两个/更多单词。在上述示例中,我得到了“涵洞安装”一词,它是一个两个单词的词。
public static void main(String[] args) {
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
String text = "A complete theoretical analysis of the hydraulics of a particular culvert installation is time-consuming and complex.";
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
// this is the text of the token
String word = token.get(CoreAnnotations.TextAnnotation.class);
// this is the POS tag of the token
String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
// this is the NER label of the token
String ne = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);
String lemma =token.getString(CoreAnnotations.LemmaAnnotation.class);
System.out.println(String.format("Print: word: [%s] pos: [%s] ne: [%s] lemma: [%s]", word, pos, ne, lemma));
String ner=token.get(NamedEntityTagAnnotation.class);
System.out.println("##### "+ner);
}
}
}