Question

我有一个大文件，我希望使用awk进行拆分，并根据第一列的值命名。

终端上的awk命令可以用作：

cat phased.MySpF1.vcf | awk '!/^#/{print>$1}'

因此创建单独的文件，如1,2,3，具体取决于第一列的值。

我想把这个命令放在python2文件中，这样我就可以将拆分的文件存储在另一个子目录中，这样可以在后面的部分中轻松访问每个块。

# create a directory to store the splitted files:
if os.path.exists('SplitVCF'):
    shutil.rmtree('SplitVCF', ignore_errors=False, onerror=None)
os.makedirs('SplitVCF')

# now split the vcf file
split_cmd = ['cat', vcf_path, '|', 'awk', '!/^#/{print>$1}']
subprocess.Popen(split_cmd, stdout='SplitVCF/')

#or,
subprocess.call(split_cmd, stdout='SplitVCF/')

但是，我收到的错误是：

Traceback (most recent call last):
  File "phaser.py", line 2167, in <module>
    main();
  File "phaser.py", line 227, in main
    subprocess.Popen(split_cmd, stdout='SplitVCF/')
  File "/usr/lib/python2.7/subprocess.py", line 386, in __init__
    errread, errwrite), to_close = self._get_handles(stdin, stdout, stderr)
  File "/usr/lib/python2.7/subprocess.py", line 823, in _get_handles
    c2pwrite = stdout.fileno()
AttributeError: 'str' object has no attribute 'fileno'

Answer 1

来自docs

stdin，stdout和stderr分别指定执行程序的标准输入，标准输出和标准错误文件句柄。有效值为PIPE，现有文件描述符（正整数），现有文件对象和无

所以不是一个字符串。而是做

with open('a/file/path', 'w') as out:
     subprocess.Popen(split_cmd, stdout='out)

将子进程调用的每个输出重定向到不同文件中的不同目录

1 个答案: