我正在尝试使用Tregex和StanfordCoreNLP Server来处理法语文本。服务器配置了法语属性,但/ tregex端点似乎使用英语解析自动处理文本。
但是,当我尝试使用常规解析器解析文本时,一切都运行良好,法语解析器已正确应用。
以下是一个例子:
输入句子
Pierre et Jean sont dans la cuisine
常规解析:
http://localhost:8082/?properties=outputFormat&properties=depparse.model+%3D+edu%2Fstanford%2Fnlp%2Fmodels%2Fparser%2Fnndep%2FUD_French.gzpos.model&properties=parse.model&properties=tokenize.language&prop
erties=depparse.language&properties=annotators
(ROOT (SENT (NP(NPP皮埃尔) (COORD(CC et) (NP(NPP Jean)))) (VN(V sont)) (PP(P dans) (NP(DET la)(NC菜)))))
Tregex Parsing
http://localhost:8082/tregex?properties=annotators&properties=outputFormat&properties=depparse.model+%3D+edu%2Fstanford%2Fnlp%2Fmodels%2Fparser%2Fnndep%2FUD_French.gzpos.model&properties=parse.model&properties=tokenize.language&properties=depparse.language&pattern=NP%3Dnoun1+%24+NP%3Dnoun2
{ “句子”: [ { “0”:{ “匹配”:“(NP(NNP Pierre)\ n(CC et)\ n(NNP Jean))\ n”, “namedNodes”:[ { “noun1”:“(NP(NNP Pierre)\ n(CC et)\ n(NNP Jean))\ n” < }, { “noun2”:“(NP \ n(NP(JJ sont)(NNS dans))\ n(PP(FW la)\ n>(NP(NN cuisine))))\ n” } ] }, “1”:{ “匹配”:“(NP \ n(NP(JJ sont)(NNS dans))\ n(PP(FW la)\ n(NP>(NN cuisine))))\ n”, “namedNodes”:[ { “noun1”:“(NP \ n(NP(JJ sont)(NNS dans))\ n(PP(FW la)\ n>(NP(NN cuisine))))\ n” }, { “noun2”:“(NP(NNP Pierre)\ n(CC et)\ n(NNP Jean))\ n” } ] } } ] }
如您所见,两种情况下的解析都不相同。
快速检查服务器日志后,我注意到注释器会被自动丢弃并被默认的英文模型替换:
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "coref" with signature [coref.language:fr;coref.mode:statistical;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "ssplit" with signature [tokenize.language:fr;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "depparse" with signature [depparse.language:french;depparse.model:edu/stanford/nlp/models/parser/nndep/UD_French.gz;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "tokenize" with signature [tokenize.language:fr;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "mention" with signature [mention.type:dep;coref.language:fr;coref.mode:statistical;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "pos" with signature [pos.model:edu/stanford/nlp/models/pos-tagger/french/french.tagger;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "openie" with signature [openie.strip_entailments:true;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.AnnotatorPool - Replacing old annotator "parse" with signature [parse.model:edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz;parse.binaryTrees:true;] with new annotator with signature []
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-2-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-2-thread-3] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.2 sec].
我在CoreNLP测试服务器(http://corenlp.run/)上尝试了相同的示例,它运行良好。
我想我的服务器配置有问题,但我不知道是什么:)
非常感谢你的帮助!
路易斯
答案 0 :(得分:0)
我遇到了同样的问题,这就是我发现的:我们可以在Stanford Core NLP使用的the code中看到它没有考虑帐户参数但使用默认值(第1090行)。他们已经在github上修复了它,这就是为什么the corenlp server不再遇到这个问题而且3.9.0版本应该没问题。
希望它有所帮助