Stanford Parser:没有返回崩溃的依赖项

时间:2015-02-04 06:15:24

标签: nlp stanford-nlp

我正面临这个特殊问题:

预计“谁”应该被“高尔夫球手”取代

  

获得61分的高尔夫球手赢得了比赛。

在线Stanford解析器返回的Typed Collapsed Dependencies:

det(golfer-2, The-1)
nsubj(scored-4, golfer-2)
nsubj(won-7, golfer-2)
rcmod(golfer-2, scored-4)
det(61-6, a-5)
dobj(scored-4, 61-6)
root(ROOT-0, won-7)
det(tournament-9, the-8)
dobj(won-7, tournament-9)

下载的软件返回的依赖关系:

root(ROOT-0, won-7)
det(golfer-2, The-1)
nsubj(won-7, golfer-2)
nsubj(scored-4, who-3)
rcmod(golfer-2, scored-4)
det(61-6, a-5)
dobj(scored-4, 61-6)
det(tournament-9, the-8)
dobj(won-7, tournament-9)

使用的配置:

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner,  parse");
......
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println(dependencies.toList());

提前致谢。

修改

从语法结构中创建语义图以修复它。


    Tree tree = sentence.get(TreeAnnotation.class);
    GrammaticalStructure gs = gsf.newGrammaticalStructure(tree);
    Collection tdl = gs.typedDependenciesCCprocessed();
    SemanticGraph dependencies = new SemanticGraph(tdl);

2 个答案:

答案 0 :(得分:1)

CoreNLP首先使用"pos"注释器为句子生成词性标注。解析器在解析过程中使用这些标记作为先验。

这通常解释了在线解析器演示与本地运行CoreNLP之间的差异。您是否可以尝试禁用POS标记器注释器并查看生成的解析是否发生变化?

答案 1 :(得分:1)

这是对过去行为的合理回归。我不认为有任何理由把它拿出来,它只是在某种程度上破了而且没有人注意到。看起来这件事发生在不久前。版本3.2似乎是正确生成nsubj(scored-4, golfer-2)的最后一个版本。随意在Github上提出问题......

不知何故,这只发生在CoreNLP上,而不是直接调用解析器。代码路径中必定存在一些差异。如果你给这个命令,你得到你想要的......

stanford-corenlp-full-2015-01-30 manning$ echo "The golfer who scored a 61 won the tournament." | java -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat penn,typedDependencies edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz -
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.5 sec].
Parsing file: -
Parsing [sent. 1 len. 10]: The golfer who scored a 61 won the tournament .
(ROOT
  (S
    (NP
      (NP (DT The) (NN golfer))
      (SBAR
        (WHNP (WP who))
        (S
          (VP (VBD scored)
            (NP (DT a) (CD 61))))))
    (VP (VBD won)
      (NP (DT the) (NN tournament)))
    (. .)))

det(golfer-2, The-1)
nsubj(scored-4, golfer-2)
nsubj(won-7, golfer-2)
rcmod(golfer-2, scored-4)
det(61-6, a-5)
dobj(scored-4, 61-6)
root(ROOT-0, won-7)
det(tournament-9, the-8)
dobj(won-7, tournament-9)

Parsed file: - [1 sentences].
Parsed 10 words in 1 sentences (30.86 wds/sec; 3.09 sents/sec).