Question

我非常努力使用python和stanford CoreNLP解析xml文件。我想要做的是用Stanford Core NLP分析nlp.txt并将其输出到xml文件。我的代码在这里：

import os
import subprocess
import xml.etree.ElementTree as ET

fname = 'nlp.txt'
fname_parsed = 'nlp.txt.xml'


def parse_nlp():


    '''Analyze nlp.txt with Stanford Core NLP and output it to xml file.
     Do not execute if result file already exists.
    '''
    if not os.path.exists(fname_parsed):

        # Execute StanfordCoreNLP, output standard error to parse.out
        subprocess.run(
            'java -cp "/usr/local/lib/stanford-corenlp-full-2017-06-09/*"'
            ' -Xmx2g'
            ' edu.stanford.nlp.pipeline.StanfordCoreNLP'
            ' -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref'
            ' -file ' + fname + ' 2>parse.out',
            shell=True,     # execute with shell
            check=True      # error check
        )


# analyze nlp.txt
parse_nlp()

# parse xml of result
root = ET.parse(fname_parsed)

# take only word
for word in root.iter('word'):
    print(word.text)

然后，我获得了标准错误，如：

    Traceback (most recent call last):
  File "stanford.py", line 30, in <module>
    parse_nlp()
  File "stanford.py", line 25, in parse_nlp
    check=True      # error check
  File "/anaconda/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'java -cp "/usr/local/lib/stanford-corenlp-full-2017-06-09/*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -file nlp.txt 2>parse.out' returned non-zero exit status 1.

虽然我认为我坚持解析文件，但我无法理解发生了什么，我怎么能解决这个问题。我是相对初学者编码，刚刚进入NLP分析。如果详细解释，我们将非常感激。

Answer 1

希望还不算太晚：p

我编写的代码几乎与您的代码相同，并且遇到相同的问题。

查找页面Using Stanford CoreNLP from the command line后，我发现-Xmx2g用于为Java子进程指定2GB内存。尽管我的MBP仅具有8G内存，但我将-Xmx2g更改为-Xmx3g。而且有效！

Answer 2

您的java命令未正确输入subprocess.run()。如果您创建一个完整java命令的字符串并将其用作subprocess.run()的第一个参数，它应该可以正常工作。

命令行文档：

https://stanfordnlp.github.io/CoreNLP/cmdline.html

错误：尝试使用python和Stanford CoreNLP解析xml

2 个答案: