最好的独特打击

时间:2017-01-24 14:26:20

标签: python

示例输入文件

name1 name1 100
name1 name2 99.4
name1 name3 67.8
name1 name4 40.2
name2 name2 100
name2 name1 98

我想1)按第1列分组2)比较名称column1和2,如果相同,则忽略3)打印具有最高值的行。所以我的输出是,

name1 name2 99.4
name2 name1 98

我的尝试,如果我使用sort而不是max,我的最佳命中就会消失。

import csv
from itertools import groupby
from operator import itemgetter
with open('input.txt','rb') as f1:
    with open('output.txt', 'wb') as f2:
        reader = csv.reader(f1, delimiter='\t')
        writer1 = csv.writer(f2, delimiter='\t')
        for group, rows in groupby(reader, itemgetter(0)):
            for line in rows:
                 if line[0] == line[1]:
                     continue
                 else:
                     best = max(rows, key=lambda r: (float(r[2])))
                     writer1.writerow(best)

2 个答案:

答案 0 :(得分:3)

filter不必要的行,然后按第一列分组,max分组:

with open('input.txt','rb') as f1:
    with open('output.txt', 'wb') as f2:
        reader = csv.reader(f1, delimiter='\t')
        writer1 = csv.writer(f2, delimiter='\t')
        out_rows = [
            max(g, key=lambda x: float(x[2]) for k, g in groupby(
                filter(lambda x: x[0]!=x[1], reader), key=itemgetter(0)
            )
        ]       
        writer1.writerows(out_rows)

答案 1 :(得分:1)

rows返回的groupby()迭代器重复两次,一次在for line in rows:,另一次在max(rows)。迭代器最终会耗尽,导致您出现错误。

首先从rows迭代器中创建一个列表,然后您就可以多次迭代它。