多次迭代文件(Python)

时间:2015-05-07 05:37:53

标签: python

我有一个看起来像这样的文件:

1,var1
2,var2
3,var3
4,var1_val1
5,var2_val2
6,var1_val2
7,var3_val1
8,var2_val1
9,var3_val2

输出文件应如下所示:

var1 1 4 6 
var2 2 8 5
var3 3 7 9

我的代码非常复杂。它有效,但效率很低。这可以更有效地完成:

def findv(var):
    with open(inputfile) as f:
        for line in f:
            elems=line.split(',')
            name=elems[0]
            if var!=name:
                continue
            field=elems[0]
        f.seek(0)
        for line in f:
            elems2=line.split(',')
            if elems2[1].endswith(var+'_val1'):
                first=elems2[0]
        f.seek(0)
        for line in f:
            elems3=line.split(',')
            if elems3[1].endswith(var+'_val3'):
                second=elems3[0]
    return var,field,first,second

代码的主要部分:

with open(inputfile) as f:
    with open(outputfile) as fout:
        for line in f:
            tmp=line.split(',')
        if current[1].endswith('val1') or current[1].endswith('val2'):
            continue
        v=tmp[1]
        result=findv(v)
        f2.write(result)

每次输入文件中的一行以varx开头,然后多次搜索文件,直到找到与varx_val1和varx_val2对应的字段时,才会调用我的函数findv(var)。

编辑:我需要保留输入文件的顺序,因此var1必须首先出现在输出文件中,然后是var2,然后是var3等。

2 个答案:

答案 0 :(得分:4)

使用字典,键是标签,列表是存储值。这样,您只需循环一次文件。

from collections import defaultdict

results = defaultdict(list)

with open('somefile.txt') as f:
   for line in f:
      if line.strip():
         value, key = line.split(',')
         if '_' in key:
             key = key.split('_')[0] # returns var1 from var1_val1
         results[key].append(value)

for k,v in results.iteritems():
    print('{} {}'.format(k, ' '.join(v)))

以下是包含以下评论的版本:

from collections import OrderedDict

results = OrderedDict

with open('somefile.txt') as f:
   for line in f:
      line = line.strip()
      if line:
         value, key = line.split(',')
         key = key.split('_')[0] # returns var1 from var1_val1
         results.setdefault(key, []).append(value)

for k,v in results.iteritems():
    print('{} {}'.format(k, ' '.join(v)))

答案 1 :(得分:0)

我编写了一个python程序,只迭代文件一次,将所有重要数据读入dict,然后将dict写入输出文件。

#!/usr/bin/env python3
import collections

output = collections.OrderedDict()

with open(inputfile, 'r') as infile:
    for line in infile:
        dat, tmp = line.strip().split(',')
        if '_val' in tmp:
            key, idxstr = tmp.split('_val')
            idx = int(idxstr)
        else:
            key = tmp
            idx = 0
        output.setdefault(key, ["", "", ""])[idx] = dat

with open(outoutfile, 'w') as outfile:
    for var in output:
        v = output[var]
        outfile.write('{} {}\n'.format(var, ' '.join(v)))
根据评论修改

更新