Question

我正在使用子进程模块从shell调用程序，该模块将二进制文件输出到STDOUT。

我使用Popen（）来调用程序然后我想将流传递给Python包中的一个函数（称为“pysam”），遗憾的是它不能用于Python文件对象，但可以读取来自STDIN。所以我想要做的是让shell命令的输出从STDOUT进入STDIN。

如何在Popen / subprocess模块中完成？这是我调用shell程序的方式：

p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout

这将读取“my_cmd”的STDOUT输出并在p中获取流。由于我的Python模块无法直接从“p”读取，因此我尝试使用以下方法将“my_cmd”的STDOUT重定向回STDIN：

p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE, shell=True).stdout

然后我调用我的模块，它使用“ - ”作为STDIN的占位符：

s = pysam.Samfile("-", "rb")

上述调用仅意味着从STDIN读取（表示为“ - ”）并将其作为二进制文件（“rb”）读取。

当我尝试这个时，我只是将二进制输出发送到屏幕，它看起来不像Samfile（）函数可以读取它。即使我删除了对Samfile的调用，也会发生这种情况，因此我认为这是我对Popen的调用，这是问题，而不是下游步骤。

编辑：为了回答答案，我试过了：

sys.stdin = subprocess.Popen(tagBam_cmd, stdout=subprocess.PIPE, shell=True).stdout
print "Opening SAM.."                                                                                            
s = pysam.Samfile("-","rb")
print "Done?"
sys.stdin = sys.__stdin__

这似乎挂了。我得到了输出：

Opening SAM..

但它永远不会超过Samfile（“ - ”，“rb”）行。知道为什么吗？

知道如何解决这个问题吗？

编辑2：我正在添加一个Pysam文档的链接，以防它有帮助，我真的无法弄清楚这一点。文档页面是：

http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html

关于流的具体说明如下：

http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html#using-streams

特别是：

“”” Pysam不支持从真正的python文件对象读取和写入，但它确实支持从stdin和stdout读取和写入。以下示例从stdin读取并写入stdout：

infile = pysam.Samfile( "-", "r" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)

它也适用于BAM文件。以下脚本将stdin上的BAM格式文件转换为stdout上的SAM格式文件：

infile = pysam.Samfile( "-", "rb" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)

注意，只有文件打开模式需要从r更改为rb。 “”“

所以我只想从Popen获取流，它读取stdout，并将其重定向到stdin，这样我就可以使用Samfile（“ - ”，“rb”），因为上面的章节状态是可能的。

感谢。

Answer 1

如果您使用stdout=subprocess.PIPE，我会在stdout上看到二进制文件时有点困惑，但是，如果您想欺骗pysam使用，则需要使用sys.stdin它

例如：

sys.stdin = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout
s = pysam.Samfile("-", "rb")
sys.stdin = sys.__stdin__ # restore original stdin

UPDATE ：这假设pysam在Python解释器的上下文中运行，因此在指定“ - ”时表示Python解释器的stdin。不幸的是，它没有;当指定“ - ”时，它直接从文件描述符0中读取。

换句话说，它没有使用Python的stdin（sys.stdin）概念，所以替换它对pysam.Samfile（）没有影响。它也不可能从Popen调用中获取输出并以某种方式将其“推”到文件描述符0;它是只读的，另一端连接到您的终端。

将输出传递到文件描述符0的唯一真正方法是将其移动到另一个脚本并将第一个脚本连接起来。这可以确保第一个脚本中Popen的输出最终会出现在第二个脚本的文件描述符0上。

因此，在这种情况下，您最好的选择是将其拆分为两个脚本。第一个将调用my_cmd并获取其输出并将其用于输入另一个调用pysam.Samfile（“ - ”，“rb”）的Python脚本的第二个Popen。

Answer 2

在处理pysam的特定情况下，我能够使用命名管道（http://docs.python.org/library/os.html#os.mkfifo）解决这个问题，这是一个管道可以像常规文件一样访问。通常，您希望管道的使用者（读者）在开始写入管道之前进行监听，以确保您不会遗漏任何内容。但是，如果在stdin上没有注册任何内容，则pysam.Samfile（“ - ”，“rb”）将如上所述挂起。

假设您正在处理需要相当长时间的先前计算（例如，在将bam传递到pysam之前对bam进行排序），您可以启动先前的计算，然后在输出任何内容之前监听流：

import os
import tempfile
import subprocess
import shutil
import pysam

# Create a named pipe
tmpdir = tempfile.mkdtemp()
samtools_prefix = os.path.join(tmpdir, "namedpipe")
fifo = samtools_prefix + ".bam"
os.mkfifo(fifo)

# The example below sorts the file 'input.bam',
# creates a pysam.Samfile object of the sorted data,
# and prints out the name of each record in sorted order

# Your prior process that spits out data to stdout/a file
# We pass samtools_prefix as the output prefix, knowing that its
# ending file will be named what we called the named pipe
subprocess.Popen(["samtools", "sort", "input.bam", samtools_prefix])

# Read from the named pipe
samfile = pysam.Samfile(fifo, "rb")

# Print out the names of each record
for read in samfile:
    print read.qname

# Clean up the named pipe and associated temp directory
shutil.rmtree(tmpdir)

Answer 3

如果您的系统支持它;你可以use /dev/fd/# filenames：

process = subprocess.Popen(args, stdout=subprocess.PIPE)
samfile = pysam.Samfile("/dev/fd/%d" % process.stdout.fileno(), "rb")

使用Python子进程将stdout重定向到stdin？

3 个答案: