Question

使用Stanford CoreNLP时，我在xml输出文件中得到结果。在其中我找到一个名为扬声器的列，例如：

        <word>Mike</word>
        <lemma>Mike</lemma>
        <CharacterOffsetBegin>0</CharacterOffsetBegin>
        <CharacterOffsetEnd>4</CharacterOffsetEnd>
        <POS>NNP</POS>
        <NER>PERSON</NER>

        *<Speaker>PER0</Speaker>*

        <TrueCase>INIT_UPPER</TrueCase>
        <TrueCaseText>Mike</TrueCaseText>
        <sentiment>Neutral</sentiment>

那么我如何在java代码中操纵Speaker结果呢？我怎样才能改善它的结果呢？例如，在谈话中我想要迈克而不是PER0

谢谢。

Answer 1

使用DOM XML解析器：

How to read an XML File with the Java DOM Parser

Answer 2

首先，谢谢@Thomas Bigger的答案我会尽力更清楚，
在此代码中，

    PrintWriter xmlOut = new PrintWriter("xmlOutput.xml");   
    Properties props = new Properties(); 
    props.setProperty("annotators","tokenize, ssplit, pos, lemma, truecase, ner, parse,quote, mention, dcoref, sentiment"); 
    props.put("truecase.overwriteText", "true");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);       
    Annotation annotation = new Annotation("Mike said : \"I vote for Hillary.\"\n 
                                           peter said : \"I vote for Donald.\"");
    pipeline.annotate(annotation);   
    pipeline.xmlPrint(annotation, xmlOut);

xmlOut.xml提供了两个句子的分析：

First Sentence

＆LT;迈克说＆gt; ，＆lt; ：＆gt; ，＆lt; ＆＃34; ＆GT;和＆lt; ＆＃34; ＆GT;被认为是narator的演讲（PER0）。
＆LT;我投票给希拉里＆gt;被视为人的言论1.

Second Sentence

＆LT;彼得说＆gt; ，＆lt; ：＆gt; ，＆lt; ＆＃34; ＆GT;和＆lt; ＆＃34; ＆GT;被认为是narator的演讲（PER0）。
＆LT;我投票支持唐纳德。＆GT;被认为是彼得的演讲。 =＆GT;这里唯一的区别是我用小写写了peter，当我用大写写的时候，扬声器的结果变成了4。

在斯坦福CoreNLP上搜索JavaDoc时，我发现有关扬声器的课程如下：
- CoreAnnotations.SpeakerAnnotation
- CoreNLPProtos.SpeakerInfo
- CoreNLPProtos.SpeakerInfo.Builder
- CoreNLPProtos.SpeakerInfoOrBuilder
- SpeakerInfo
- SpeakerInfo
- SpeakerMatch

所以我首先想要在我的xmlOut中获得更有效的结果，其次，要知道如何在不使用DOM XML的情况下使用这些类来提取发言者及其语音。

如何从对话中提取Speaker注释？

2 个答案: