我有一个像这样的制表符分隔的文件,
Acc Pop snp1 snp2 snp3 snp4 snp5
a1 pop1 0 1 0 1 0
a2 pop1 0 1 0
a3 pop1 0 1 0 0 0
a4 pop1 0 1 0 1 0
a5 pop1 0 1 0 0
a6 pop1 1 0 0 0
a7 pop1 0 1 0 0 0
a8 pop1 0 1 0 0 0
a9 pop1 0 1 0
a10 pop1 0 1 0 0 1
我需要用'-9'替换所有丢失的数据。所以输出看起来像这样,
Acc Pop snp1 snp2 snp3 snp4 snp5
a1 pop1 0 1 0 1 0
a2 pop1 0 1 -9 -9 0
a3 pop1 0 1 0 0 0
a4 pop1 0 1 0 1 0
a5 pop1 0 1 0 -9 0
a6 pop1 -9 1 0 0 0
a7 pop1 0 1 0 0 0
a8 pop1 0 1 0 0 0
a9 pop1 0 1 0 -9 -9
a10 pop1 0 1 0 0 1
这是我在下面的尝试,
import re
infilename = 'file2.txt'
outfilename = 'file.txt'
regex = re.compile(r"\s+")
with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
for line in infile:
line = line.rstrip('\n').split('\t')
outfile.write(regex.sub('-9', line))
答案 0 :(得分:3)
你几乎得到了它。
当您进行拆分时,您会得到一个项目列表,因此您无法对其进行正则表达式。
而是通过列表迭代,如果没有设置,只需用-9替换值。
import re
infilename = 'file2.txt'
outfilename = 'file.txt'
with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
for line in infile:
line = line.rstrip('\n').split('\t')
line = [val if val else '-9' for val in line]
outfile.write('\t'.join(line) + '\n')
请记住,这将取代所有'空白'表格中的字段,甚至标题中的一个字段。