Question

我想知道如何处理以下简单文件来解决问题：

Query1 
Hit id 1a score 5
Hit id 2a score 3
Hit id 3a score 2

Query2
Hit id 2a score 1
Hit id 2b score 2

我想解决的问题是如何找到与同一查询下的其他Hit分数相比得分最高的Hit。我希望得到以下输出：

Query1            # print the title
Hit id 1a score 5 # print the Hit line with highest score value

Query2
Hit id 2b score 2

我一直在尝试迭代文件：

for l in file:
    if l.startswith("Query"):
       print l
    elif l.startswith("Hit"):
       l = l.split() #splitting over spaces for each Hit line so I can make 
                     #operation over the score value.

知道如何比较解析得分值并输出命中得分最高？

Answer 1

你可以这样做，

>>> s = '''Query1 
Hit id 1a score 5
Hit id 2a score 3
Hit id 3a score 2

Query2
Hit id 2a score 1
Hit id 2b score 2'''
>>> q = s.split('\n\n')
>>> for i in q:
    j = i.split('\n')
    print j[0]
    h = max([int(i.split()[-1]) for i in j[1:]])
    for y in j[1:]:
        if str(h) == y.split()[-1].strip():
            print y


Query1 
Hit id 1a score 5
Query2
Hit id 2b score 2
>>>

Answer 2

#Continuing on what is given in the question, here is one solution  
file = open('20151122g.dat', 'r')
maxl = None
maxv = 0
for l in file:
    if l.startswith("Query"):
        if not maxl is None:
            print maxl
            maxl = None
            maxv = 0
        print(l)
        continue
    if l.startswith("Hit"):
        col = l.split()
        val = int(col[4])
        if maxv < val:
            maxv = val
            maxl = l
        pass
    pass
if not maxl is None:
    print(maxl)

如何迭代找到得分最高的线？

2 个答案: