Question

我尝试在长文本上运行pycorenlp并收到CoreNLP request timed out. Your document may be too long错误消息。怎么解决？有没有办法让Stanford CoreNLP超时？

我不想将文本细分为较小的文本。

以下是我使用的代码：

'''
From https://github.com/smilli/py-corenlp/blob/master/example.py
'''
from pycorenlp import StanfordCoreNLP
import pprint

if __name__ == '__main__':
    nlp = StanfordCoreNLP('http://localhost:9000')
    fp = open("long_text.txt")
    text = fp.read()
    output = nlp.annotate(text, properties={
        'annotators': 'tokenize,ssplit,pos,depparse,parse',
        'outputFormat': 'json'
    })
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(output)

Stanford Core NLP Server使用以下方式启动：

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer 9000

Answer 1

您可以在'timeout': '50000'词典中添加properties（单位为ms）：

output = nlp.annotate(text, properties={
    'timeout': '50000',
    'annotators': 'tokenize,ssplit,pos,depparse,parse',
    'outputFormat': 'json'
})

否则，您可以启动指定超时的Stanford Core NLP服务器：

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 50000

（documentation没有提及timeout参数，也许他们忘了添加它，它至少出现在stanford-corenlp-full-2015-12-09, a.k.a. 3.6.0.中，这是最新的公开发布）

pycorenlp：＆＃34; CoreNLP请求超时。您的文件可能太长了＃34;

1 个答案: