Question

我在AWS上运行CoreNLP专用服务器并尝试从ruby发出请求。服务器似乎正在正确接收请求，但问题是服务器似乎忽略输入注释器列表并始终默认为所有注释器。我发布请求的Ruby代码如下：

uri = URI.parse(URI.encode('http://ec2-************.compute.amazonaws.com//?properties={"tokenize.whitespace": "true", "annotators": "tokenize,ssplit,pos", "outputFormat": "json"}'))

http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Post.new("/v1.1/auth")
request.add_field('Content-Type', 'application/json')
request.body = text
response = http.request(request)
json = JSON.parse(response.body)

在服务器上的nohup.out日志中，我看到以下内容：

[/ 38.122.182.107:53507] API调用w / annotators tokenize，ssplit，pos，depparse，lemma，ner，提及，coref，natlog，openie

.... 输入文本块在这里 ....

[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - 添加注释器标记化 [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator：未提供任何标记器类型。默认为PTBTokenizer。 [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - 添加注释器ssplit [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - 添加注释器pos 从edu / stanford / nlp / models / pos-tagger / english-left3words / english-left3words-distsim.tagger中读取POS标记模型...完成[2.0秒]。 [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - 添加注释器depparse 加载depparse模型文件：edu / stanford / nlp / models / parser / nndep / english_UD.gz ... 预计100000，已用时间：2.259（s）初始化依赖解析器完成[5.1秒]。 [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - 添加注释引理 [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - 添加注释器从edu / stanford / nlp / models / ner / english.all.3class.distsim.crf.ser.gz加载分类器...完成[2.6秒]。从edu / stanford / nlp / models / ner / english.muc.7class.distsim.crf.ser.gz加载分类器...完成[1.2秒]。从edu / stanford / nlp / models / ner / english.conll.4class.distsim.crf.ser.gz加载分类器...完成[7.2秒]。 [pool-1-thread-1] INFO edu.stanford.nlp.time.JollyDayHolidays - 从classpath edu / stanford / nlp / models / sutime / jollyday / Holidays_sutime.xml初始化SUTime的JollyDayHoliday作为sutime.binder.1。从edu / stanford / nlp / models / sutime / defs.sutime.txt读取TokensRegex规则 2016年2月22日下午11:37:20 edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules 信息：阅读83条规则从edu / stanford / nlp / models / sutime / english.sutime.txt中读取TokensRegex规则 2016年2月22日下午11:37:20 edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules 信息：阅读267条规则从edu / stanford / nlp / models / sutime / english.holidays.sutime.txt中读取TokensRegex规则 2016年2月22日下午11:37:20 edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules 信息：阅读25条规则 [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - 添加注释器提及使用提及检测器类型：依赖 [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - 添加注释器coref

等等。

当我在命令行上使用wget运行测试查询时，它似乎工作正常。

wget --post-data 'the quick brown fox jumped over the lazy dog' 'ec2-*******.compute.amazonaws.com/?properties={"tokenize.whitespace": "true", "annotators": "tokenize,ssplit,pos", "outputFormat": "json"}' -O -

任何关于为什么会发生这种情况的帮助都会得到感谢！

Answer 1

事实证明请求的构造不正确。路径应该在Post.new的参数中。修正了以下代码，以防有人帮助：

host = "http://ec2-***********.us-west-2.compute.amazonaws.com"

path = '/?properties={"tokenize.whitespace": "true", "annotators": "tokenize,ssplit,pos", "outputFormat": "json"}'

encoded_path = URI.encode(path)

uri = URI.parse(URI.encode(host))
http = Net::HTTP.new(uri.host, uri.port)
http.set_debug_output($stdout)
# request = Net::HTTP::Post.new("/v1.1/auth")
request = Net::HTTP::Post.new(encoded_path)
request.add_field('Content-Type', 'application/json')
request.body = text
response = http.request(request)
json = JSON.parse(response.body)

Stanford CoreNLP专用服务器忽略注释器输入

1 个答案: