我正在尝试将几个文件连接在一起并添加标题。
import subprocess
outpath = "output.tab"
with open( outpath, "w" ) as outf :
"write a header"
if header is True:
p1 = subprocess.Popen(["head", "-n1", files[-1] ], stdout= outf, )
if type(header) is str:
p1 = subprocess.Popen(["head", "-n1", header ], stdout= outf,)
for fl in files:
print( fl )
p1 = subprocess.Popen(["tail", "-n+2", fl], stdout= outf, )
由于某种原因,某些文件(fl
)仅部分打印,而下一个文件在前一个文件的字符串中开始:
awk '{print NF}' output.tab | uniq -c
108 11
1 14
69 11
1 10
35 11
1 16
250 11
1 16
有没有办法在Python中修复它?
混乱线条的一个例子:
$tail -n+108 output.tab | head -n1
CENPA chr2 27008881.0 2701ABCD3 chr1 94883932.0 94944260.0 0.0316227766017 0.260698861451 0.277741584016 0.302602378581 0.4352790705329718 56 16
$grep -n A1 'CENPA' file1.tab
109:CENPA chr2 27008881.0 27017455.0 1.0 0.417081004817 0.0829327365256 0.545205239241 0.7196619496326693 95 3
110-CENPO chr2 25016174.0 25045245.0 1000.0 0.151090930896 -0.0083671250883 0.50882773122 0.0876177652747541 82 0
$grep -n 'ABCD3' file2.tab
2:ABCD3 chr1 94883932.0 94944260.0 0.0316227766017 0.260698861451 0.277741584016 0.302602378581 0.4352790705329718 56 16
答案 0 :(得分:1)
我认为这里的问题是subprocess.Popen()
默认情况下是异步运行的,而您似乎希望它同步运行。实际上,所有head
和tail
命令都在同时运行,并指向输出文件。
要解决此问题,您可能只想添加.wait()
:
import subprocess
outpath = "output.tab"
with open( outpath, "w" ) as outf :
"write a header"
if header is True:
p1 = subprocess.Popen(["head", "-n1", files[-1] ], stdout= outf, )
p1.wait() # Pauses the script until the command finishes
if type(header) is str:
p1 = subprocess.Popen(["head", "-n1", header ], stdout= outf,)
p1.wait()
for fl in files:
print( fl )
p1 = subprocess.Popen(["tail", "-n+2", fl], stdout= outf, )
p1.wait()