我一直在阅读API和试图找到答案的文档,但还没有接近解决问题。
我想拿一堆句子并将输出作为所有句子的XML输出:
<token id="1">
<word>That</word>
<lemma>that</lemma>
<CharacterOffsetBegin>0</CharacterOffsetBegin>
<CharacterOffsetEnd>4</CharacterOffsetEnd>
<POS>DT</POS>
<NER>O</NER>
</token>
我只是弄清楚如何解析树,但这对我想要构建的东西没有帮助。无论如何,这是我现在正在使用的代码:
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
String text = "We won the game."; // Add your text here!
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for(CoreMap sentence: sentences) {
// this is the parse tree of the current sentence
Tree tree = sentence.get(TreeAnnotation.class);
// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
}
我正在使用文档中的代码。
答案 0 :(得分:3)
使用内置xmlPrint更容易一些:
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation("Four score and seven years ago.");
pipeline.annotate(document);
FileOutputStream os = new FileOutputStream(new File("./target/", "nlp.xml"));
pipeline.xmlPrint(document, os);
答案 1 :(得分:1)
我花了大约4个小时,但我终于找到了一些有用的源代码。这是更新后的代码:
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
String text = "We won the game."; // Add your text here!
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
Document xmldoc = XMLOutputter.annotationToDoc(document, pipeline);
// below is a tweaked version of XMLOutputter.writeXml()
ByteArrayOutputStream sw = new ByteArrayOutputStream();
Serializer ser = new Serializer(sw);
ser.setIndent(0);
ser.setLineSeparator("\n"); // gonna kill this in a moment
ser.write(xmldoc);
ser.flush();
String xmlstr = sw.toString();
xmlstr = xmlstr.replace("\n", "");
System.out.println(xmlstr);
希望将来可以帮助某人。
答案 2 :(得分:1)
谢谢Tiffan Meloney。
这非常有帮助,我还找到了另一种方法,基于你的例子:
Document doc = XMLOutputter.annotationToDoc(annotation, pipeline);
System.out.println( doc.toXML() );
我也希望这可以帮助别人