根据https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker
,Apache OpenNLP对POS和块的格式与斯坦福NLP的格式有所不同使用“快速的棕色狐狸跳过懒狗。”
Apache OpenNLP POS: The_DT quick_JJ brown_JJ fox_NN jumped_VBD over_IN the_DT lazy_JJ dog_NN ._.
Apache OpenNLP Chunk: [NP The_DT quick_JJ brown_JJ fox_NN ] [VP jumped_VBD ] [PP over_IN ] [NP the_DT lazy_JJ dog_NN ] ._.
我注意到Stanford Parser确实提供了一个完整解析的块和POS,看起来像这样:
(ROOT\n (S\n (NP (DT The) (JJ quick) (JJ brown) (NN fox))\n (VP (VBD jumped)\n (PP (IN over)\n (NP (DT the) (JJ lazy) (NN dog.))))))
我们如何修改输出,使其与Apache OpenNLP兼容?
我注意到Stanford NLP使用嵌套短语,例如动词短语包含介词以及名词短语......