Question

我有一个大文件需要在送到另一个命令之前进行处理。我可以将处理过的数据保存为临时文件，但我希望避免使用它。我写了一个生成器，它一次处理每一行，然后跟随脚本作为输入提供给外部命令。但是我在第二轮循环中得到了“关闭文件的I / O操作”异常：

cmd = ['intersectBed', '-a', 'stdin', '-b', bedfile]
p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for entry in my_entry_generator: # <- this is my generator
    output = p.communicate(input='\t'.join(entry) + '\n')[0]
    print output

我读了另一个使用p.stdin.write的类似问题。但仍然有同样的问题。

我做错了什么？

[编辑] 我用以下内容替换了最后两个语句（感谢SpliFF）：

    output = p.communicate(input='\t'.join(entry) + '\n')
    if output[1]: print "error:", output[1]
    else: print output[0]

查看外部程序是否有任何错误。但不是。在p.communicate一行中仍然有相同的例外。

Answer 1

communicate对象的subprocess.Popen方法只能调用一次。它的作用是将您提供的输入发送到进程，同时读取所有stdout和stderr输出。并且通过“all”，我的意思是它等待进程退出，以便它知道它具有所有输出。一旦communicate返回，该过程就不再存在。

如果要使用communicate，则必须在循环中重新启动该过程，或者为其提供一个 all 来自生成器的输入的单个字符串。如果要进行流式通信，逐位发送数据，则必须不使用communicate。相反，您需要在阅读p.stdin和p.stdout时写信至p.stderr。这样做很棘手，因为您无法分辨哪个输出是由哪个输入引起的，并且因为您很容易遇到死锁。有些第三方库可以帮助您解决这个问题，比如Twisted。

如果你想以交互方式执行此操作，发送一些数据，然后在发送更多数据之前等待并处理结果，事情变得更加困难。你可能应该使用像pexpect这样的第三方库。

当然，如果你可以在循环内部启动流程，那将会更容易：

cmd = ['intersectBed', '-a', 'stdin', '-b', bedfile] for entry in my_entry_generator: p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) output = p.communicate(input='\t'.join(entry) + '\n')[0] print output

Answer 2

可能你的intersectBed应用程序正在退出时出错，但由于你没有打印任何stderr数据，你无法看到它。尝试：

result = p.communicate(input='\t'.join(entry) + '\n')
if result[1]:
  print "error:", result[1]
else:
  print result[0]

使用generator作为子进程输入;得到“关闭文件的I / O操作”异常

2 个答案: