如果我的制表符分隔文件是:
a b 77.8
a d 77.8
e f 56.7
e r 40.0
我想在行[0]中打印一个元素,行[2]中的最大值,但是当值相同时,要打印两者,如何修改下面的代码?
import csv
from itertools import groupby
from operator import itemgetter
with open('input.txt,'rb') as f1:
with open('out.txt','wb') as f2:
reader = csv.reader(f1, delimiter='\t')
writer1 = csv.writer(f2, delimiter='\t')
for group, rows in groupby(filter(lambda x: x[0]!=x[1], reader), key=itemgetter(0)):
best = max(rows, key=lambda r: (float(r[2])))
writer1.writerow(best)
所以,我的输出应该是这样的:
a b 77.8
a d 77.8
e f 56.7
答案 0 :(得分:1)
不是从rows
编写最大项目,而是按行第三个值按递减顺序对行进行排序,将其按第三个值分组,并将项目写入第一个组:
import csv
from itertools import groupby
from operator import itemgetter
with open('input.txt','rb') as f_in, open('out.txt','wb') as f_out:
reader = csv.reader(f_in, delimiter='\t')
writer1 = csv.writer(f_out, delimiter='\t')
for group, rows in groupby(filter(lambda x: x[0]!=x[1], reader), key=itemgetter(0)):
rows = sorted(rows, key=lambda r: (float(r[2])), reverse=True)
_, best = next(groupby(rows, key=itemgetter(2)))
writer1.writerows(best)
out.txt
中的输出:
a b 77.8
a d 77.8
e f 56.7
答案 1 :(得分:1)
使用pandas
的替代方法,(对文件的读写更好):
import pandas as pd
df = pd.read_table('eg.txt', header=None, sep=' ')
with open('output.txt', 'wb') as f:
for c in set(df[0]):
d = df[df[0] == c].sort_values(by=[2], ascending=False)
d = d[d[2] == d[2].iloc[0]]
d.to_csv(f, index=False, sep='\t', header=False)
给出输出:
a b 77.8
a d 77.8
e f 56.7