subprocess open file和pipe awk命令

时间:2017-05-16 15:16:02

标签: python

这是我的输入文件格式:

@SRR2056440.1 1 length=100
TGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTCCCA
+SRR2056440.1 1 length=100
BCBFFFEFHHHHHJJJJJJIJJJJJJJJIJHHIJJIIJJJJJIJJIJJJJJJJJFHIJJJHHHHHHFDDDBDDD>>ACDEDDDDDDDDDDDDDDDDDEDD
@SRR2056440.2 2 length=100
CTGCCGCCACCGCAGCAGCCACAGGCAGAGGAGGACGAGGACGACTGGGAATCGTAGGGGGCTCCATGACACCTTCCCCCCCAGACCCAGACTTGGGCCA
+SRR2056440.2 2 length=100
CCCFFFFFHHHHHJJJJJJJJJJJIJIJIGJGGIGGJIJJEHFEDDDDDDDDDDABDDDDDDDDDDDDDDADDDDDDDDDDDCDDDDDDBBDDCDDBDD@
@SRR2056440.3 3 length=100
TCTGCCGCCACCGCAGCAGCCACAGGCAGAGGAGGACGAGGACGACTGGGAATCGTAGGGGGCTCCATGACACCTTCCCCCCCAGACCCAGACTTGGGCC
+SRR2056440.3 3 length=100
CCCFFFFFHGHHHJJJJJIJJJJJJIJJIJJJIJJIIIGIJ<CDBCDDDDDDDDDDDDDDDDDDDDDDDDDDDDDCDDDDDDDDDDDDDDDDDDCDCBDD

这是我想要执行的命令:

cat input.fq | awk 'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1f\t%.1f\n",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'

命令的输出:

  

100.0 0.0

我想使用subprocess在python脚本中执行该命令。我做过几次尝试,但我无法弄清楚,这是我的最后一次尝试:

awk_comm = r"""'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1f\t%.1f\n",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
cmd = ['cat', 'input.fq', '|', 'awk', awk_comm]
p2 = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
out1, err = p2.communicate()

修改

我无法在输出中看到任何错误。它会卡住,永远运行。

4 个答案:

答案 0 :(得分:2)

以下适用于我。

>>> awk_comm = r"""cat input.fq | awk 'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1f\t%.1f\n",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
>>> p2 = subprocess.Popen(awk_comm, stdout=subprocess.PIPE,shell=True)
>>> res = p2.communicate()
>>> res
('100.0\t0.0\n', None)

答案 1 :(得分:1)

这里shell=True没有意义。只需将subprocess.Popen对象设置为执行以下操作即可使用shell的所有内容:

# the original awk code, with whitespace added for readability
awk_command = r"""
NR%4==2 {
  sum+=length($0);
  nr++;
  sumsq+=length($0)*length($0)
}
END {
  printf "%.1f\t%.1f\n", sum/nr, sqrt(sumsq/nr-(sum/nr)**2)
}
"""

p2 = subprocess.Popen(
  ['awk', awk_command],
  stdin=open('input.fq', 'r'),  # pass a file handle to input.fq directly on awk's stdin
  stdout=subprocess.PIPE,
  stderr=subprocess.PIPE)
out1, err = p2.communicate()

答案 2 :(得分:0)

默认情况下,Python不使用shell来运行命令......但是管道由shell评估!!您需要通过shell=True

p2 = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)

答案 3 :(得分:0)

您可以使用命令模块来实现此目的:

import commands
awk_comm = r"""'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1f\t%.1f\n",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
p1 = commands.getoutput('cat input.fq | awk ' + awk_comm)
print p1

希望这有帮助