Python用于分隔文本文件的NA查找和替换

时间:2014-10-30 00:41:11

标签: python regex csv na delimited-text

我对python很陌生。假设我在(非常大)分隔的文本文件中有数据,如下所示:

a|b|c|d|e

1|.|.|-|.

1.2|2.6|||1.7

由于文本文件非常大,我想逐行读取和写入。我想用.替换-NA或使用字符串import csv f = open('sample1_fixed.txt','wb') targets1, new1 = ['|.|','|-|','||','| |'], '|NA|' for line in open('sample1.txt', 'rb'): for target in targets1: if target in line: line = line.replace(target,new1) for target in targets1: if target in line: line = line.replace(target,new1) f.write(line + "\n") f.close() 清空。这就是我的尝试:

import csv
import re

f=open('sample1_fixed.txt','wb')

with open('sample1.txt','rb') as inputfile:
    read=csv.reader(inputfile, delimiter='|')
    for row in read:
        text = row[1]
        text = re.sub(r'^\.$','NA',text)
        text = re.sub(r'^-$','NA',text)
        f.write(text + '\n')
f.close()

但我认为必须有更好的方法,使用分隔符?此解决方案也不会在行的末尾和行的开头拾取实例。来自更好的程序员的任何想法?

预期产出:

  

A | B | C | d |电子

     

1 | NA | NA | NA | NA

     

1.2 | 2.6 | NA | NA | 1.7

我也尝试过使用csv模块和正则表达式:

{{1}}

但这只允许我一次写一列,而且我不确定如何将它们全部输出...

1 个答案:

答案 0 :(得分:2)

csv.reader与自定义分隔符=' |'一起使用,并为replace_NAs使用辅助函数:

import csv

delim = '|'

def replace_NAs(row, NA_values=["", ".", "-"]):
    if x in NA_values:
        return "NA"
    else:
        return x

with open('infile') as csvfile:
    reader = csv.reader(csvfile, delimiter=delim)
    for row in reader:
        transformed_row = [replace_NAs(x) for x in row]
        print delim.join(transformed_row)

a|b|c|d|e
1|NA|NA|NA|NA
1.2|2.6|NA|NA|1.7