我想知道如何处理以下简单文件来解决问题:
Query1
Hit id 1a score 5
Hit id 2a score 3
Hit id 3a score 2
Query2
Hit id 2a score 1
Hit id 2b score 2
我想解决的问题是如何找到与同一查询下的其他Hit分数相比得分最高的Hit。我希望得到以下输出:
Query1 # print the title
Hit id 1a score 5 # print the Hit line with highest score value
Query2
Hit id 2b score 2
我一直在尝试迭代文件:
for l in file:
if l.startswith("Query"):
print l
elif l.startswith("Hit"):
l = l.split() #splitting over spaces for each Hit line so I can make
#operation over the score value.
知道如何比较解析得分值并输出命中得分最高?
答案 0 :(得分:1)
你可以这样做,
>>> s = '''Query1
Hit id 1a score 5
Hit id 2a score 3
Hit id 3a score 2
Query2
Hit id 2a score 1
Hit id 2b score 2'''
>>> q = s.split('\n\n')
>>> for i in q:
j = i.split('\n')
print j[0]
h = max([int(i.split()[-1]) for i in j[1:]])
for y in j[1:]:
if str(h) == y.split()[-1].strip():
print y
Query1
Hit id 1a score 5
Query2
Hit id 2b score 2
>>>
答案 1 :(得分:1)
#Continuing on what is given in the question, here is one solution
file = open('20151122g.dat', 'r')
maxl = None
maxv = 0
for l in file:
if l.startswith("Query"):
if not maxl is None:
print maxl
maxl = None
maxv = 0
print(l)
continue
if l.startswith("Hit"):
col = l.split()
val = int(col[4])
if maxv < val:
maxv = val
maxl = l
pass
pass
if not maxl is None:
print(maxl)