重复元组列表,更喜欢某些元组

时间:2014-09-04 13:13:17

标签: python list duplicates tuples

我有三个项目元组的列表。前两项通常是重复项(GPS坐标),而最后一项是分数(信号强度)

[(62.45807, -114.41026, 8),
(62.45807, -114.41026, 11),
(62.45807, -114.41026, 18),
(62.45807, -114.41026, 16),
(62.45807, -114.41026, 9),
(62.45785, -114.41003, 23),
(62.45785, -114.41003, 19),
(62.45785, -114.41003, 11),
(62.45785, -114.41003, 17),
(62.45785, -114.41003, 14),
(62.45785, -114.41003, 11),
(62.45785, -114.41003, 15),
(62.45765, -114.40978, 28),
(62.45765, -114.40978, 16),
(62.45765, -114.40978, 10),
(62.45765, -114.40978, 15),
(62.45765, -114.40978, 25)]

我想知道如何删除重复的GPS坐标,同时更喜欢最高分,最终得到这个:

[(62.45807, -114.41026, 18),
(62.45785, -114.41003, 23),
(62.45765, -114.40978, 28)]

如何做同样的事情,但平均分数最终得到类似的东西

[(62.45807, -114.41026, 12),
(62.45785, -114.41003, 16),
(62.45765, -114.40978, 19)]

2 个答案:

答案 0 :(得分:2)

听起来像是itertools.groupby的工作:

>>> from itertools import groupby

最大:

>>> [max(g, key=lambda x:x[-1]) for k, g in groupby(data, key= lambda x:x[:2])]
[(62.45807, -114.41026, 18),
 (62.45785, -114.41003, 23),
 (62.45765, -114.40978, 28)]

平均:

>>> [a + (round(sum(c for _, _, c in b)/float(len(b))),) 
                        for a, b in ((k, list(g)) for k, g in 
                                           groupby(data, key= lambda x:x[:2]))]
[(62.45807, -114.41026, 12.0),
 (62.45785, -114.41003, 16.0),
 (62.45765, -114.40978, 19.0)]

答案 1 :(得分:0)

您可以创建一个函数,将每个值映射到带有键的字典作为GPS坐标,其中值是分数列表

def create_gps_score_dict(gps_score_list):
    gps_score_dict = {}
    for gps_score in gps_score_list:
        if (gps_score[0], gps_score[1]) in gps_score_dict.keys():
            gps_score_dict[(gps_score[0], gps_score[1])].append(gps_score[2])
        else:
            gps_score_dict[(gps_score[0], gps_score[1])] = [gps_score[2]]
    return gps_score_dict

现在,您可以生成查看此简单字典的结果。

def max_gps_scores(gps_score_dict):
    gps_score_list = []
    for gps, score in gps_score_dict.items():
        gps_score_list.append((gps[0], gps[1], max(score))

实施例

>>> gps_score_list=[(62.45807, -114.41026, 8),
    (62.45807, -114.41026, 11),
    (62.45807, -114.41026, 18),
    (62.45807, -114.41026, 16),
    (62.45807, -114.41026, 9),
    (62.45785, -114.41003, 23),
    (62.45785, -114.41003, 19),
    (62.45785, -114.41003, 11),
    (62.45785, -114.41003, 17),
    (62.45785, -114.41003, 14),
    (62.45785, -114.41003, 11),
    (62.45785, -114.41003, 15),
    (62.45765, -114.40978, 28),
    (62.45765, -114.40978, 16),
    (62.45765, -114.40978, 10),
    (62.45765, -114.40978, 15),
    (62.45765, -114.40978, 25)]

>>> max_gps_scores(create_gps_score_dict(gps_score_list))
[(62.45807, -114.41026, 18), (62.45765, -114.40978, 28), (62.45785, -114.41003,23)]

我会把平均值留给你!