我有一个看起来像这样的文件:
1,var1
2,var2
3,var3
4,var1_val1
5,var2_val2
6,var1_val2
7,var3_val1
8,var2_val1
9,var3_val2
输出文件应如下所示:
var1 1 4 6
var2 2 8 5
var3 3 7 9
我的代码非常复杂。它有效,但效率很低。这可以更有效地完成:
def findv(var):
with open(inputfile) as f:
for line in f:
elems=line.split(',')
name=elems[0]
if var!=name:
continue
field=elems[0]
f.seek(0)
for line in f:
elems2=line.split(',')
if elems2[1].endswith(var+'_val1'):
first=elems2[0]
f.seek(0)
for line in f:
elems3=line.split(',')
if elems3[1].endswith(var+'_val3'):
second=elems3[0]
return var,field,first,second
代码的主要部分:
with open(inputfile) as f:
with open(outputfile) as fout:
for line in f:
tmp=line.split(',')
if current[1].endswith('val1') or current[1].endswith('val2'):
continue
v=tmp[1]
result=findv(v)
f2.write(result)
每次输入文件中的一行以varx开头,然后多次搜索文件,直到找到与varx_val1和varx_val2对应的字段时,才会调用我的函数findv(var)。
编辑:我需要保留输入文件的顺序,因此var1必须首先出现在输出文件中,然后是var2,然后是var3等。
答案 0 :(得分:4)
使用字典,键是标签,列表是存储值。这样,您只需循环一次文件。
from collections import defaultdict
results = defaultdict(list)
with open('somefile.txt') as f:
for line in f:
if line.strip():
value, key = line.split(',')
if '_' in key:
key = key.split('_')[0] # returns var1 from var1_val1
results[key].append(value)
for k,v in results.iteritems():
print('{} {}'.format(k, ' '.join(v)))
以下是包含以下评论的版本:
from collections import OrderedDict
results = OrderedDict
with open('somefile.txt') as f:
for line in f:
line = line.strip()
if line:
value, key = line.split(',')
key = key.split('_')[0] # returns var1 from var1_val1
results.setdefault(key, []).append(value)
for k,v in results.iteritems():
print('{} {}'.format(k, ' '.join(v)))
答案 1 :(得分:0)
我编写了一个python程序,只迭代文件一次,将所有重要数据读入dict,然后将dict写入输出文件。
#!/usr/bin/env python3
import collections
output = collections.OrderedDict()
with open(inputfile, 'r') as infile:
for line in infile:
dat, tmp = line.strip().split(',')
if '_val' in tmp:
key, idxstr = tmp.split('_val')
idx = int(idxstr)
else:
key = tmp
idx = 0
output.setdefault(key, ["", "", ""])[idx] = dat
with open(outoutfile, 'w') as outfile:
for var in output:
v = output[var]
outfile.write('{} {}\n'.format(var, ' '.join(v)))
根据评论修改更新