Question

我正在尝试使用Tregex和StanfordCoreNLP Server来处理法语文本。服务器配置了法语属性，但/ tregex端点似乎使用英语解析自动处理文本。

但是，当我尝试使用常规解析器解析文本时，一切都运行良好，法语解析器已正确应用。

以下是一个例子：

输入句子

Pierre et Jean sont dans la cuisine

常规解析：

http://localhost:8082/?properties=outputFormat&properties=depparse.model+%3D+edu%2Fstanford%2Fnlp%2Fmodels%2Fparser%2Fnndep%2FUD_French.gzpos.model&properties=parse.model&properties=tokenize.language&prop erties=depparse.language&properties=annotators

（ROOT （SENT （NP（NPP皮埃尔）（COORD（CC et）（NP（NPP Jean））））（VN（V sont））（PP（P dans）（NP（DET la）（NC菜）））））

Tregex Parsing

http://localhost:8082/tregex?properties=annotators&properties=outputFormat&properties=depparse.model+%3D+edu%2Fstanford%2Fnlp%2Fmodels%2Fparser%2Fnndep%2FUD_French.gzpos.model&properties=parse.model&properties=tokenize.language&properties=depparse.language&pattern=NP%3Dnoun1+%24+NP%3Dnoun2

{ “句子”： [ { “0”：{ “匹配”：“（NP（NNP Pierre）\ n（CC et）\ n（NNP Jean））\ n”， “namedNodes”：[ { “noun1”：“（NP（NNP Pierre）\ n（CC et）\ n（NNP Jean））\ n” ＆LT; }， { “noun2”：“（NP \ n（NP（JJ sont）（NNS dans））\ n（PP（FW la）\ n＆gt;（NP（NN cuisine））））\ n” } ] }， “1”：{ “匹配”：“（NP \ n（NP（JJ sont）（NNS dans））\ n（PP（FW la）\ n（NP>（NN cuisine））））\ n”， “namedNodes”：[ { “noun1”：“（NP \ n（NP（JJ sont）（NNS dans））\ n（PP（FW la）\ n＆gt;（NP（NN cuisine））））\ n” }， { “noun2”：“（NP（NNP Pierre）\ n（CC et）\ n（NNP Jean））\ n” } ] } } ] }

如您所见，两种情况下的解析都不相同。

快速检查服务器日志后，我注意到注释器会被自动丢弃并被默认的英文模型替换：

[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "coref" with signature [coref.language:fr;coref.mode:statistical;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "ssplit" with signature [tokenize.language:fr;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "depparse" with signature [depparse.language:french;depparse.model:edu/stanford/nlp/models/parser/nndep/UD_French.gz;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "tokenize" with signature [tokenize.language:fr;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "mention" with signature [mention.type:dep;coref.language:fr;coref.mode:statistical;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "pos" with signature [pos.model:edu/stanford/nlp/models/pos-tagger/french/french.tagger;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "openie" with signature [openie.strip_entailments:true;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "parse" with signature [parse.model:edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz;parse.binaryTrees:true;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-2-thread-3] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.2 sec].

我在CoreNLP测试服务器（http://corenlp.run/）上尝试了相同的示例，它运行良好。

我想我的服务器配置有问题，但我不知道是什么:)

非常感谢你的帮助！

路易斯

Answer 1

我遇到了同样的问题，这就是我发现的：我们可以在Stanford Core NLP使用的the code中看到它没有考虑帐户参数但使用默认值（第1090行）。他们已经在github上修复了它，这就是为什么the corenlp server不再遇到这个问题而且3.9.0版本应该没问题。

希望它有所帮助

如何使用Tregex不使用StanfordCoreNLP服务器的英文文本？

1 个答案: