Question

我刚刚开始学习Python，如果有人愿意帮助，我需要一些帮助，技巧或解决方案。

我有一个看起来像这样的文件：

    2  C00000002 score:  -48.649 nathvy =  49 nconfs =         3878
    3  C00000001 score:  -44.988 nathvy =  41 nconfs =         1988
    4  C00000002 score:  -42.674 nathvy =  49 nconfs =         6740
    5  C00000002 score:  -42.453 nathvy =  49 nconfs =         4553
    6  C00000002 score:  -41.829 nathvy =  49 nconfs =         7559
    7  C00000002 score:  -41.156 nathvy =  49 nconfs =         2251
    8  C00000002 score:  -39.520 nathvy =  49 nconfs =         3129
    9  C00000004 score:  -38.928 nathvy =  24 nconfs =          150
   10  C00000002 score:  -38.454 nathvy =  49 nconfs =         9473
   11  C00000004 score:  -37.704 nathvy =  24 nconfs =          156
   12  C00000001 score:  -37.558 nathvy =  41 nconfs =           51

我的第二列是一些未在此处排序的ID，其中一些是重复的，例如（{C00000001）。它们都分配了不同的编号，后跟score:（编号通常以-开头）。

我想做的是：
1）阅读第二列（未排序的ID），并始终选择出现的第一列。因此，如果使用C00000001，它将使用score : -44.988进行选择。

2）现在，当我显示唯一值时，我想根据score:之后的数字对它们进行排序，这意味着最负数位于第一个位置，而最正数位于第一个位置。最后一个位置。

Answer 1

您可以使用简单的python执行此操作。 Python列表具有内置的排序方法

with open("in_file") as handle:
    already_present = set()
    l = []
    for line in handle:
        line_parts = line.strip().split()
        l.append(line_parts)
        key = line_parts[1]
        if key in already_present:
            continue
        already_present.add(key)

l.sort(key=lambda x:float(x[3]))

Answer 2

您好，这是一个可能的解决方案

    def readLine(line,acc):
      result =line.split()
      id=result[1]
      value=result[3]
      if id not in acc:
            acc[id]=value;

    def main():
      filepath = 'myfile.csv'
      acc={};
      with open(filepath) as fp:
            for line in fp:
            readLine(line,acc)
      for key, value in sorted(acc.iteritems(), key=lambda (k, v): (v, k)):
            print "%s: %s" % (key, value)

    if __name__ == "__main__":
      main()

Answer 3

这是 a 可能性：

import pprint

scores = {}
for line in open('/tmp/data.txt'):
    _, code, _, score, _, _, _, _, _, _ = line.split()
    if code not in scores:
        scores[code] = score

pprint.pprint(scores)

sorted_by_score = sorted(
    [(code, score) for code, score in scores.items()],
    key=lambda v: v[1],
    reverse=True)

pprint.pprint(sorted_by_score)

第一部分可以使用元组列表，但速度会慢一些。

排序值并获取最佳分数（最高分数）

3 个答案: