如何使用Tregex不使用StanfordCoreNLP服务器的英文文本?

时间:2018-01-17 11:02:08

标签: parsing nlp

我正在尝试使用Tregex和StanfordCoreNLP Server来处理法语文本。服务器配置了法语属性,但/ tregex端点似乎使用英语解析自动处理文本。

但是,当我尝试使用常规解析器解析文本时,一切都运行良好,法语解析器已正确应用。

以下是一个例子:

输入句子

  

Pierre et Jean sont dans la cuisine

常规解析:

http://localhost:8082/?properties=outputFormat&properties=depparse.model+%3D+edu%2Fstanford%2Fnlp%2Fmodels%2Fparser%2Fnndep%2FUD_French.gzpos.model&properties=parse.model&properties=tokenize.language&prop erties=depparse.language&properties=annotators

  

(ROOT    (SENT      (NP(NPP皮埃尔)        (COORD(CC et)          (NP(NPP Jean))))      (VN(V sont))      (PP(P dans)        (NP(DET la)(NC菜)))))

Tregex Parsing

http://localhost:8082/tregex?properties=annotators&properties=outputFormat&properties=depparse.model+%3D+edu%2Fstanford%2Fnlp%2Fmodels%2Fparser%2Fnndep%2FUD_French.gzpos.model&properties=parse.model&properties=tokenize.language&properties=depparse.language&pattern=NP%3Dnoun1+%24+NP%3Dnoun2

  

{    “句子”: [      {        “0”:{          “匹配”:“(NP(NNP Pierre)\ n(CC et)\ n(NNP Jean))\ n”,          “namedNodes”:[            {              “noun1”:“(NP(NNP Pierre)\ n(CC et)\ n(NNP Jean))\ n”   < },            {              “noun2”:“(NP \ n(NP(JJ sont)(NNS dans))\ n(PP(FW la)\ n>(NP(NN cuisine))))\ n”            }          ]        },        “1”:{          “匹配”:“(NP \ n(NP(JJ sont)(NNS dans))\ n(PP(FW la)\ n(NP>(NN cuisine))))\ n”,          “namedNodes”:[            {              “noun1”:“(NP \ n(NP(JJ sont)(NNS dans))\ n(PP(FW la)\ n>(NP(NN cuisine))))\ n”           },            {              “noun2”:“(NP(NNP Pierre)\ n(CC et)\ n(NNP Jean))\ n”            }          ]        }      }    ]   }

如您所见,两种情况下的解析都不相同。

快速检查服务器日志后,我注意到注释器会被自动丢弃并被默认的英文模型替换:

[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "coref" with signature [coref.language:fr;coref.mode:statistical;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "ssplit" with signature [tokenize.language:fr;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "depparse" with signature [depparse.language:french;depparse.model:edu/stanford/nlp/models/parser/nndep/UD_French.gz;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "tokenize" with signature [tokenize.language:fr;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "mention" with signature [mention.type:dep;coref.language:fr;coref.mode:statistical;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "pos" with signature [pos.model:edu/stanford/nlp/models/pos-tagger/french/french.tagger;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "openie" with signature [openie.strip_entailments:true;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "parse" with signature [parse.model:edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz;parse.binaryTrees:true;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-2-thread-3] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.2 sec].

我在CoreNLP测试服务器(http://corenlp.run/)上尝试了相同的示例,它运行良好。

我想我的服务器配置有问题,但我不知道是什么:)

非常感谢你的帮助!

路易斯

1 个答案:

答案 0 :(得分:0)

我遇到了同样的问题,这就是我发现的:我们可以在Stanford Core NLP使用的the code中看到它没有考虑帐户参数但使用默认值(第1090行)。他们已经在github上修复了它,这就是为什么the corenlp server不再遇到这个问题而且3.9.0版本应该没问题。

希望它有所帮助