我对python很陌生。假设我在(非常大)分隔的文本文件中有数据,如下所示:
a|b|c|d|e
1|.|.|-|.
1.2|2.6|||1.7
由于文本文件非常大,我想逐行读取和写入。我想用.
替换-
,NA
或使用字符串import csv
f = open('sample1_fixed.txt','wb')
targets1, new1 = ['|.|','|-|','||','| |'], '|NA|'
for line in open('sample1.txt', 'rb'):
for target in targets1:
if target in line:
line = line.replace(target,new1)
for target in targets1:
if target in line:
line = line.replace(target,new1)
f.write(line + "\n")
f.close()
清空。这就是我的尝试:
import csv
import re
f=open('sample1_fixed.txt','wb')
with open('sample1.txt','rb') as inputfile:
read=csv.reader(inputfile, delimiter='|')
for row in read:
text = row[1]
text = re.sub(r'^\.$','NA',text)
text = re.sub(r'^-$','NA',text)
f.write(text + '\n')
f.close()
但我认为必须有更好的方法,使用分隔符?此解决方案也不会在行的末尾和行的开头拾取实例。来自更好的程序员的任何想法?
预期产出:
A | B | C | d |电子
1 | NA | NA | NA | NA
1.2 | 2.6 | NA | NA | 1.7
我也尝试过使用csv模块和正则表达式:
{{1}}
但这只允许我一次写一列,而且我不确定如何将它们全部输出...
答案 0 :(得分:2)
将csv.reader
与自定义分隔符=' |'一起使用,并为replace_NAs使用辅助函数:
import csv
delim = '|'
def replace_NAs(row, NA_values=["", ".", "-"]):
if x in NA_values:
return "NA"
else:
return x
with open('infile') as csvfile:
reader = csv.reader(csvfile, delimiter=delim)
for row in reader:
transformed_row = [replace_NAs(x) for x in row]
print delim.join(transformed_row)
a|b|c|d|e
1|NA|NA|NA|NA
1.2|2.6|NA|NA|1.7