在写入stdout和logfile时,unix tee的Python子进程调用会截断stdin

时间:2015-03-31 14:47:58

标签: python subprocess tee

我正在尝试使用子进程在python中运行一系列现有脚本。当我使用此代码时,链条按预期工作:

p1 = subprocess.Popen(samtoolsSortArguments, stdout=subprocess.PIPE)
p2 = subprocess.Popen(samtoolsViewArguments, stdin=p1.stdout, stdout=subprocess.PIPE)
p1.stdout.close()
p3 = subprocess.Popen(htseqCountArguments, stdin=p2.stdout, stdout=file_out)
p2.stdout.close()
p3.communicate()
file_out.close()

输出如下:

100000 GFF lines processed.
[bam_sort_core] merging from 2 files...
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
2000000 GFF lines processed.
2100000 GFF lines processed.
2200000 GFF lines processed.
2300000 GFF lines processed.
2400000 GFF lines processed.
2500000 GFF lines processed.
2600000 GFF lines processed.
2700000 GFF lines processed.
2764635 GFF lines processed.
100000 SAM alignment records processed.
200000 SAM alignment records processed.
300000 SAM alignment records processed.
400000 SAM alignment records processed.
500000 SAM alignment records processed.
600000 SAM alignment records processed.
700000 SAM alignment records processed.
800000 SAM alignment records processed.
900000 SAM alignment records processed.
1000000 SAM alignment records processed.
1100000 SAM alignment records processed.
1200000 SAM alignment records processed.
1300000 SAM alignment records processed.
1400000 SAM alignment records processed.
1500000 SAM alignment records processed.
1600000 SAM alignment records processed.
1700000 SAM alignment records processed.
1800000 SAM alignment records processed.
1900000 SAM alignment records processed.
2000000 SAM alignment records processed.
2100000 SAM alignment records processed.
2200000 SAM alignment records processed.
2300000 SAM alignment records processed.
2400000 SAM alignment records processed.
2500000 SAM alignment records processed.
2600000 SAM alignment records processed.
2700000 SAM alignment records processed.
2800000 SAM alignment records processed.
2900000 SAM alignment records processed.

所有这些输出都来自stderr,我希望能够将它写入终端和日志文件。为了实现这一点,我使用unix tee命令作为python中的子进程,并从前一个subprocess命令传递stderr。代码如下所示:

p1 = subprocess.Popen(samtoolsSortArguments, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
tee = subprocess.Popen(['tee', logfile], stdin=p1.stderr)
p1.stderr.close()

p2 = subprocess.Popen(samtoolsViewArguments, stdin=p1.stdout, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p1.stdout.close()
tee = subprocess.Popen(['tee', logfile], stdin=p2.stderr)
p2.stderr.close()

p3 = subprocess.Popen(htseqCountArguments, stdin=p2.stdout, stdout=file_out, stderr=subprocess.PIPE)
p2.stdout.close()
tee = subprocess.Popen(['tee', logfile], stdin=p3.stderr)

p3.communicate()
p3.stderr.close()
tee.communicate()
file_out.close()

写入我的file_out句柄的此代码的stdout输出是正确的。即使是打印到屏幕和日志文件的stderr似乎也是正确的信息。但是,stderr的输出在某些行上被截断,我无法弄清楚原因。这是我的日志文件和终端的样子(它们匹配):

 GFF lines processed.
[bam_sort_core] merging from 2 files...
 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
 GFF lines processed.
GFF lines processed.
FF lines processed.
F lines processed.
 lines processed.
ines processed.
700000 GFF lines processed.
2764635 GFF lines processed.
nt records processed.
 records processed.
300000 SAM alignment records processed.
cords processed.
ds processed.
processed.
essed.
d.
000000 SAM alignment records processed.
00 SAM alignment records processed.
 alignment records processed.
1500000 SAM alignment records processed.
1600000 SAM alignment records processed.
1800000 SAM alignment records processed.
1900000 SAM alignment records processed.
2000000 SAM alignment records processed.
2100000 SAM alignment records processed.
2200000 SAM alignment records processed.
2500000 SAM alignment records processed.
2600000 SAM alignment records processed.
2700000 SAM alignment records processed.
2900000 SAM alignment records processed.

为什么传递给tee时的输出会被截断?这只是一个列移位吗?有没有办法解决这个问题,或者我只是想用子进程做太多事情?

编辑: 这是@tdelaney代码的SSCCE。它再现了我在更广泛的背景下使用它时出现的相同错误。此示例应从包含名为test.txt的文件的文件夹运行。 test.txt应如下所示(或任何类似的东西,只要某些行是“test”):

test
not
test

这是玩具代码(确保将shebang更改为指向你的python):

#!/usr/local/bin/python2

import sys
import subprocess
import threading

logfile = "./testlog.txt"

arg1 = ["ls", "-l"]
arg2 = ["find", "-name", "test.txt"]
arg3 = ["xargs", "grep", "-i", "-n", "test"]

def log_writer(pipe, log_fp, lock):
    for line in pipe:
        with lock:
            log_fp.write(line)
            sys.stdout.write(line)

with open(logfile, 'w') as log_fp:
    lock = threading.Lock()
    threads = []
    p1 = subprocess.Popen(arg1, stdout=subprocess.PIPE)
    threads.append(threading.Thread(target=log_writer, args=(p1.stdout, log_fp, lock)))

    p2 = subprocess.Popen(arg2, stdin=p1.stdout, stdout=subprocess.PIPE)
    p1.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p2.stdout, log_fp, lock)))

    p3 = subprocess.Popen(arg3, stdin=p2.stdout, stdout=subprocess.PIPE)
    p2.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p3.stdout, log_fp, lock)))

    for t in threads:
        t.start()

    p3.communicate()

    for t in threads:
        t.join()

注意:如果我注释掉close()和communic()行代码运行。我有点担心这样做,但从那时起我将在更广泛的背景下遇到各种其他问题。

1 个答案:

答案 0 :(得分:1)

问题是您有多个tee写入单个文件。它们每个都有自己的文件指针和文件的当前偏移量,并会覆盖其他内容的位。一种解决方案是使用线程和python中的互斥锁实现日志文件写入。

#!/bin/env python

import sys
import subprocess
import threading

logfile = "./testlog.txt"
file_out = open("./test.output.txt", "w")

arg1 = ["ls", "-l"]
arg2 = ["find", "-name", "test.txt"]
arg3 = ["xargs", "grep", "-i", "-n", "test"]

def log_writer(pipe, log_fp, lock):
    for line in pipe:
        with lock:
            log_fp.write(line)
            sys.stdout.write(line)

with open(logfile, 'w') as log_fp:
    lock = threading.Lock()
    threads = []
    processes = []
    p1 = subprocess.Popen(arg1, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    threads.append(threading.Thread(target=log_writer, args=(p1.stderr, log_fp, lock)))
    processes.append(p1)

    p2 = subprocess.Popen(arg2, stdin=p1.stderr, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    p1.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p2.stderr, log_fp, lock)))
    processes.append(p2)

    p3 = subprocess.Popen(arg3, stdin=p2.stdout, stdout=file_out, stderr=subprocess.PIPE)
    p2.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p3.stderr, log_fp, lock)))
    processes.append(p3)

    file_out.close()

    for t in threads:
        t.start()

    for p in processes:
        p1.wait()

    for t in threads:
        t.join()