Question

我与用于标记句子的服务器进行交互。此服务器在端口2020上本地启动。

例如，如果我通过下面使用的客户端在端口Je mange des pâtes .上发送2020，则服务器应答Je_CL mange_V des_P pâtes_N ._.，结果始终只有一行，如果我的话，总是一行输入不为空。

我目前必须通过此服务器标记9 568个文件。前9 936个文件按预期标记。之后，输入流似乎已关闭/已满/其他，因为当我尝试在IOError上书写时，我收到Broken Pipe，特别是stdin错误。

当我跳过前9 483个第一个文件时，最后一个文件被标记没有任何问题，包括导致第一个错误的文件。

我的服务器没有生成任何错误日志，表明发生了一些可疑的事情......我是否处理错误的操作？管道在一段时间后发生故障是否正常？

log = codecs.open('stanford-tagger.log', 'w', 'utf-8')
p1 = Popen(["java",
            "-cp", JAR,
            "edu.stanford.nlp.tagger.maxent.MaxentTaggerServer",
            "-client",
            "-port", "2020"],
           stdin=PIPE,
           stdout=PIPE,
           stderr=log)

fhi = codecs.open(SUMMARY, 'r', 'utf-8') # a descriptor of the files to tag

for i, line in enumerate(fhi, 1):
    if i % 500:
        print "Tagged " + str(i) + " documents..."
    tokens = ... # a list of words, can be quite long
    try:
        p1.stdin.write(' '.join(tokens).encode('utf-8') + '\n')
    except IOError:
        print 'bouh, I failed ;(('
    result = p1.stdout.readline()
    # Here I do something with result...
fhi.close()

Answer 1

除了我的评论，我可能会建议其他一些改变......

for i, line in enumerate(fhi, 1):
    if i % 500:
        print "Tagged " + str(i) + " documents..."
    tokens = ... # a list of words, can be quite long
    try:
        s = ' '.join(tokens).encode('utf-8') + '\n'
        assert s.find('\n') == len(s) - 1       # Make sure there's only one CR in s
        p1.stdin.write(s)
        p1.stdin.flush()                        # Block until we're sure it's been sent
    except IOError:
        print 'bouh, I failed ;(('
    result = p1.stdout.readline()
    assert result                               # Make sure we got something back
    assert result.find('\n') == len(result) - 1 # Make sure there's only one CR in result
    # Here I do something with result...
fhi.close()

...但是鉴于还有一个我们一无所知的客户端/服务器，很多地方都可能出错。

如果将所有查询转储到单个文件中，然后使用类似的命令从命令行运行它，它是否有效

java .... < input > output

子进程stdin.write期间断管

1 个答案: