我有一个输入文件,
10N06_64 sc635516 93.93 100.0
10N06_64 sc711028 93.99 100.0
10N06_64 sc255425 93.46 95.8
10N06_64 sc115511 87.5 93.0
116F19_238 sc121016 91.30 12.1
116F19_238 sc1132492 90.94 6.1
116F19_238 sc513573 87.38 6.1
116F19_238 sc68511 75.93 10.5
我需要在每一行[0]内进行分组和迭代,并打印3行,选择行[3]和行[2]中具有最高值的行,以便我的输出文件如下所示:
10N06_64 sc635516 93.93 100.0
10N06_64 sc711028 93.99 100.0
10N06_64 sc255425 93.46 95.8
116F19_238 sc121016 91.30 12.1
116F19_238 sc68511 75.93 10.5
116F19_238 sc1132492 90.94 6.1
这是我的尝试,但它只打印了一条最佳线,如何修改它以打印出3个最佳点击?
import csv
from itertools import groupby
from operator import itemgetter
with open('myfile','rb') as f1:
with open('outfile', 'wb') as f2:
reader = csv.reader(f1, delimiter='\t')
writer1 = csv.writer(f2, delimiter='\t')
for group, rows in groupby(reader, itemgetter(0)):
best = max(rows, key=lambda r: (float(r[3]), float(r[2])))
writer1.writerow(best)
答案 0 :(得分:3)
您可以使用heapq.nlargest()
获取具有最高值的行:
#!/usr/bin/env python
import csv
import sys
from heapq import nlargest
from itertools import groupby
writerows = csv.writer(sys.stdout, delimiter='\t').writerows
for _, rows in groupby(csv.reader(sys.stdin, delimiter='\t'), key=lambda r: r[0]):
writerows(nlargest(3, rows, key=lambda row: (float(row[3]), float(row[2]))))
示例:
$ <input.csv ./your-script >output.csv
10N06_64 sc711028 93.99 100.0
10N06_64 sc635516 93.93 100.0
10N06_64 sc255425 93.46 95.8
116F19_238 sc121016 91.30 12.1
116F19_238 sc68511 75.93 10.5
116F19_238 sc1132492 90.94 6.1
nlargest()
允许避免将输入组加载到内存中。如果行数总是很小,那么您也可以使用sorted(iterable, key=key, reverse=True)[:n]
。
答案 1 :(得分:2)
使用sorted method 代码
<强>输入:强>
10N06_64 sc635516 93.93 100.0
10N06_64 sc711028 93.99 100.0
10N06_64 sc255425 93.46 95.8
10N06_64 sc115511 87.5 93.0
116F19_238 sc121016 91.30 12.1
116F19_238 sc1132492 90.94 6.1
116F19_238 sc513573 87.38 6.1
116F19_238 sc68511 75.93 10.5
<强>代码:强>
import csv
from itertools import groupby
from operator import itemgetter
with open('word.txt','rb') as f1:
reader = csv.reader(f1, delimiter='\t')
for group, rows in groupby(reader, itemgetter(0)):
best = sorted(rows, key=lambda r: (float(r[3]), float(r[2])),reverse=True)[:3]
for a in best:
print a
print "\n"
<强>输出:强>
['10N06_64', 'sc711028', '93.99', '100.0']
['10N06_64', 'sc635516', '93.93', '100.0']
['10N06_64', 'sc255425', '93.46', '95.8']
['116F19_238', 'sc121016', '91.30', '12.1']
['116F19_238', 'sc68511', '75.93', '10.5']
['116F19_238', 'sc1132492', '90.94', '6.1']
答案 2 :(得分:2)
你可以试试这个:
import csv
from itertools import groupby
from operator import itemgetter
take = 3
with open('myfile','rb') as f1:
with open('outfile', 'wb') as f2:
reader = csv.reader(f1, delimiter='\t')
writer1 = csv.writer(f2, delimiter='\t')
for group, rows in groupby(reader, itemgetter(0)):
sorted_items = sorted(rows, key=lambda r: (float(r[3]), float(r[2])), reverse=True)
for item in sorted_items[:take]:
writer1.writerow(item)
sorted函数的作用类似于您提供给它的键的最大值和订单项。
答案 3 :(得分:1)
#你需要使用if来识别3个最佳匹配,例如:
for x in table:
if x > number1
number1 = x
elif x > number2
number2 = x
elif x > number3
number3 = x
打印number1,number2,number3