我一直试图通过subprocess
模块执行管道命令,但遇到了一些问题。
我已经看到了下面提出的解决方案,但没有一个解决了我的问题:
- sending a sequence (list) of arguments
- several Popen
commands using subprocess.PIPE
- sending a string with shell=True
我想避免使用shell=True
的第三个选项,尽管它确实在我的测试系统上产生了预期的结果。
这是在终端中运行的命令,我想复制它:
tr -c "[:alpha:]" " " < some\ file\ name_raw.txt | sed -E "s/ +/ /g" | tr "[:upper:]" "[:lower:]" > clean_in_one_command.txt
此命令根据需要清除文件。它首先在输入文件上使用tr
命令,该文件名称中包含空格。输出传递给sed
,它会删除一些空格,然后再将内容传递给tr
以使所有内容都小写。
经过几次迭代后,我最终将其全部分解为最简单的形式,实现上面的第二种方法:Popen
的几个实例,使用subprocess.PIPE
传递信息。这是冗长的,但希望使调试更容易:
from subprocess import run, Popen, PIPE
cmd1_func = ['tr']
cmd1_flags = ['-c']
cmd1_arg1 = [r'"[:alpha:]\"']
cmd1_arg2 = [r'" "']
cmd1_pass_input = ['<']
cmd1_infile = ['some file name_raw.txt']
cmd1 = cmd1_func + cmd1_flags + cmd1_arg1 + cmd1_arg2 + cmd1_pass_input + cmd1_infile
print("Command 1:", cmd1) # just to see if things look fine
cmd2_func = ['sed']
cmd2_flags = ['-E']
cmd2_arg = [r'"s/ +/ /g\"']
cmd2 = cmd2_func + cmd2_flags + cmd2_arg
print("command 2:", cmd2)
cmd3_func = ['tr']
cmd3_arg1 = ["\"[:upper:]\""]
cmd3_arg2 = ["\"[:lower:]\""]
cmd3_pass_output = ['>']
cmd3_outfile = [output_file_abs]
cmd3 = cmd3_func + cmd3_arg1 + cmd3_arg2 + cmd3_pass_output + cmd3_outfile
print("command 3:", cmd3)
# run first command into first process
proc1, _ = Popen(cmd1, stdout=PIPE)
# pass its output as input to second process
proc2, _ = Popen(cmd2, stdin=proc1.stdout, stdout=PIPE)
# close first process
proc1.stdout.close()
# output of second process into third process
proc3, _ = Popen(cmd3, stdin=proc2.stdout, stdout=PIPE)
# close second process output
proc2.stdout.close()
# save any output from final process to a logger
output = proc3.communicate()[0]
然后我会简单地将输出写入文本文件,但程序没有那么远,因为我收到以下错误:
usage: tr [-Ccsu] string1 string2
tr [-Ccu] -d string1
tr [-Ccu] -s string1
tr [-Ccu] -ds string1 string2
sed: 1: ""s/ +/ /g\"": invalid command code "
usage: tr [-Ccsu] string1 string2
tr [-Ccu] -d string1
tr [-Ccu] -s string1
tr [-Ccu] -ds string1 string2
这表明我的论点没有正确传递。似乎'
和"
引号都作为sed
传递到"
。我确实需要其中一个明确。如果我只将一个集合放入我的列表中,那么它们将被完全剥离,这也会破坏命令。
subprocess.Popen
和subprocess.run
函数。shlex
包来处理引用cmd3_pass_output = ['>']
和cmd3_outfile= [output_file_abs]
,以便只处理原始(管道)输出。我错过了什么,或者我将被迫使用shell=True
?
答案 0 :(得分:3)
此程序似乎可以满足您的需求。每个进程必须单独运行。在构建它们时,一个输出会通过管道输出到下一个输入。这些文件是独立处理的,并在流程的开始和结束时使用。
#! /usr/bin/env python3
import subprocess
def main():
with open('raw.txt', 'r') as stdin, open('clean.txt', 'w') as stdout:
step_1 = subprocess.Popen(
('tr', '-c', '[:alpha:]', ' '),
stdin=stdin,
stdout=subprocess.PIPE
)
step_2 = subprocess.Popen(
('sed', '-E', 's/ +/ /g'),
stdin=step_1.stdout,
stdout=subprocess.PIPE
)
step_3 = subprocess.Popen(
('tr', '[:upper:]', '[:lower:]'),
stdin=step_2.stdout,
stdout=stdout
)
step_3.wait()
if __name__ == '__main__':
main()