这是我的输入文件格式:
@SRR2056440.1 1 length=100
TGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTCCCA
+SRR2056440.1 1 length=100
BCBFFFEFHHHHHJJJJJJIJJJJJJJJIJHHIJJIIJJJJJIJJIJJJJJJJJFHIJJJHHHHHHFDDDBDDD>>ACDEDDDDDDDDDDDDDDDDDEDD
@SRR2056440.2 2 length=100
CTGCCGCCACCGCAGCAGCCACAGGCAGAGGAGGACGAGGACGACTGGGAATCGTAGGGGGCTCCATGACACCTTCCCCCCCAGACCCAGACTTGGGCCA
+SRR2056440.2 2 length=100
CCCFFFFFHHHHHJJJJJJJJJJJIJIJIGJGGIGGJIJJEHFEDDDDDDDDDDABDDDDDDDDDDDDDDADDDDDDDDDDDCDDDDDDBBDDCDDBDD@
@SRR2056440.3 3 length=100
TCTGCCGCCACCGCAGCAGCCACAGGCAGAGGAGGACGAGGACGACTGGGAATCGTAGGGGGCTCCATGACACCTTCCCCCCCAGACCCAGACTTGGGCC
+SRR2056440.3 3 length=100
CCCFFFFFHGHHHJJJJJIJJJJJJIJJIJJJIJJIIIGIJ<CDBCDDDDDDDDDDDDDDDDDDDDDDDDDDDDDCDDDDDDDDDDDDDDDDDDCDCBDD
这是我想要执行的命令:
cat input.fq | awk 'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1f\t%.1f\n",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'
命令的输出:
100.0 0.0
我想使用subprocess在python脚本中执行该命令。我做过几次尝试,但我无法弄清楚,这是我的最后一次尝试:
awk_comm = r"""'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1f\t%.1f\n",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
cmd = ['cat', 'input.fq', '|', 'awk', awk_comm]
p2 = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
out1, err = p2.communicate()
修改
我无法在输出中看到任何错误。它会卡住,永远运行。
答案 0 :(得分:2)
以下适用于我。
>>> awk_comm = r"""cat input.fq | awk 'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1f\t%.1f\n",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
>>> p2 = subprocess.Popen(awk_comm, stdout=subprocess.PIPE,shell=True)
>>> res = p2.communicate()
>>> res
('100.0\t0.0\n', None)
答案 1 :(得分:1)
这里shell=True
没有意义。只需将subprocess.Popen
对象设置为执行以下操作即可使用shell的所有内容:
# the original awk code, with whitespace added for readability
awk_command = r"""
NR%4==2 {
sum+=length($0);
nr++;
sumsq+=length($0)*length($0)
}
END {
printf "%.1f\t%.1f\n", sum/nr, sqrt(sumsq/nr-(sum/nr)**2)
}
"""
p2 = subprocess.Popen(
['awk', awk_command],
stdin=open('input.fq', 'r'), # pass a file handle to input.fq directly on awk's stdin
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out1, err = p2.communicate()
答案 2 :(得分:0)
默认情况下,Python不使用shell来运行命令......但是管道由shell评估!!您需要通过shell=True
:
p2 = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
答案 3 :(得分:0)
您可以使用命令模块来实现此目的:
import commands
awk_comm = r"""'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1f\t%.1f\n",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
p1 = commands.getoutput('cat input.fq | awk ' + awk_comm)
print p1
希望这有帮助