如何打印具有最高值的3行

时间:2015-09-23 13:34:45

标签: python

我有一个输入文件,

10N06_64  sc635516  93.93   100.0
10N06_64  sc711028  93.99   100.0
10N06_64  sc255425  93.46   95.8
10N06_64  sc115511  87.5    93.0
116F19_238  sc121016    91.30   12.1
116F19_238  sc1132492   90.94   6.1
116F19_238  sc513573    87.38   6.1
116F19_238  sc68511 75.93   10.5

我需要在每一行[0]内进行分组和迭代,并打印3行,选择行[3]和行[2]中具有最高值的行,以便我的输出文件如下所示:

10N06_64  sc635516  93.93   100.0
10N06_64  sc711028  93.99   100.0
10N06_64  sc255425  93.46   95.8
116F19_238  sc121016    91.30   12.1
116F19_238  sc68511 75.93   10.5
116F19_238  sc1132492   90.94   6.1

这是我的尝试,但它只打印了一条最佳线,如何修改它以打印出3个最佳点击?

import csv
from itertools import groupby
from operator import itemgetter
with open('myfile','rb') as f1:
    with open('outfile', 'wb') as f2:
        reader = csv.reader(f1, delimiter='\t')
        writer1 = csv.writer(f2, delimiter='\t')
        for group, rows in groupby(reader, itemgetter(0)):
            best = max(rows, key=lambda r: (float(r[3]), float(r[2])))
            writer1.writerow(best)

4 个答案:

答案 0 :(得分:3)

您可以使用heapq.nlargest()获取具有最高值的行:

#!/usr/bin/env python
import csv
import sys
from heapq import nlargest
from itertools import groupby

writerows = csv.writer(sys.stdout, delimiter='\t').writerows
for _, rows in groupby(csv.reader(sys.stdin, delimiter='\t'), key=lambda r: r[0]):
    writerows(nlargest(3, rows, key=lambda row: (float(row[3]), float(row[2]))))

示例:

$ <input.csv ./your-script >output.csv

输出

10N06_64    sc711028    93.99   100.0
10N06_64    sc635516    93.93   100.0
10N06_64    sc255425    93.46   95.8
116F19_238  sc121016    91.30   12.1
116F19_238  sc68511 75.93   10.5
116F19_238  sc1132492   90.94   6.1

nlargest()允许避免将输入组加载到内存中。如果行数总是很小,那么您也可以使用sorted(iterable, key=key, reverse=True)[:n]

答案 1 :(得分:2)

使用sorted method 代码

<强>输入:

10N06_64    sc635516    93.93   100.0
10N06_64    sc711028    93.99   100.0
10N06_64    sc255425    93.46   95.8
10N06_64    sc115511    87.5    93.0
116F19_238  sc121016    91.30   12.1
116F19_238  sc1132492   90.94   6.1
116F19_238  sc513573    87.38   6.1
116F19_238  sc68511 75.93   10.5

<强>代码:

import csv
from itertools import groupby
from operator import itemgetter
with open('word.txt','rb') as f1:
        reader = csv.reader(f1, delimiter='\t')
        for group, rows in groupby(reader, itemgetter(0)):
            best = sorted(rows, key=lambda r: (float(r[3]), float(r[2])),reverse=True)[:3]
            for a in best:
                print a
            print "\n"

<强>输出:

['10N06_64', 'sc711028', '93.99', '100.0']
['10N06_64', 'sc635516', '93.93', '100.0']
['10N06_64', 'sc255425', '93.46', '95.8']


['116F19_238', 'sc121016', '91.30', '12.1']
['116F19_238', 'sc68511', '75.93', '10.5']
['116F19_238', 'sc1132492', '90.94', '6.1']

答案 2 :(得分:2)

你可以试试这个:

import csv
from itertools import groupby
from operator import itemgetter

take = 3

with open('myfile','rb') as f1:
    with open('outfile', 'wb') as f2:
        reader = csv.reader(f1, delimiter='\t')
        writer1 = csv.writer(f2, delimiter='\t')
        for group, rows in groupby(reader, itemgetter(0)):
            sorted_items = sorted(rows, key=lambda r: (float(r[3]), float(r[2])), reverse=True)
            for item in sorted_items[:take]:
                writer1.writerow(item)

sorted函数的作用类似于您提供给它的键的最大值和订单项。

答案 3 :(得分:1)

#你需要使用if来识别3个最佳匹配,例如:

for x  in table:
    if x > number1
        number1 = x
    elif x > number2
        number2 = x
    elif x > number3
        number3 = x

打印number1,number2,number3