用python中的数字替换缺少的数据

时间:2015-03-31 09:01:01

标签: python

我有一个像这样的制表符分隔的文件,

Acc Pop snp1 snp2 snp3 snp4 snp5
a1  pop1    0   1   0   1   0
a2  pop1    0   1           0
a3  pop1    0   1   0   0   0
a4  pop1    0   1   0   1   0
a5  pop1    0   1   0       0
a6  pop1        1   0   0   0
a7  pop1    0   1   0   0   0
a8  pop1    0   1   0   0   0
a9  pop1    0   1   0       
a10 pop1    0   1   0   0   1

我需要用'-9'替换所有丢失的数据。所以输出看起来像这样,

Acc Pop snp1 snp2 snp3 snp4 snp5
a1  pop1    0   1   0   1   0
a2  pop1    0   1   -9 -9   0
a3  pop1    0   1   0   0   0
a4  pop1    0   1   0   1   0
a5  pop1    0   1   0   -9  0
a6  pop1    -9  1   0   0   0
a7  pop1    0   1   0   0   0
a8  pop1    0   1   0   0   0
a9  pop1    0   1   0   -9  -9
a10 pop1    0   1   0   0   1

这是我在下面的尝试,

import re
infilename = 'file2.txt'
outfilename = 'file.txt'
regex = re.compile(r"\s+")    

with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
    for line in infile:
        line = line.rstrip('\n').split('\t')
        outfile.write(regex.sub('-9', line))

1 个答案:

答案 0 :(得分:3)

你几乎得到了它。

当您进行拆分时,您会得到一个项目列表,因此您无法对其进行正则表达式。

而是通过列表迭代,如果没有设置,只需用-9替换值。

import re
infilename = 'file2.txt'
outfilename = 'file.txt'

with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
    for line in infile:
        line = line.rstrip('\n').split('\t')
        line = [val if val else '-9' for val in line]
        outfile.write('\t'.join(line) + '\n')

请记住,这将取代所有'空白'表格中的字段,甚至标题中的一个字段。