如何在列表中删除重复的字典?

时间:2018-02-19 02:08:16

标签: python

对于动态值,有时值将保持重复,比如变量

table = [
    {'man':'tim','age':'2','h':'5','w':'40'},
    {'man':'jim','age':'4','h':'3','w':'20'},
    {'man':'jon','age':'24','h':'5','w':'80'}, 
    {'man':'tim','age':'2','h':'5','w':'40'},
    {'man':'tto','age':'7','h':'4','w':'49'}    
]

这里{'man':'tim','age':'2','h':'5','w':'40'}字典集重复两次,这些都是动态值。

如何停止重复此操作,因此列表在将其呈现给模板之前不会包含任何重复的字典?

编辑:实际数据

[{'scorecardid': 1, 'progress2': 'preview', 'series2': 'Afghanistan v Zimbabwe in UAE, 2018', 'Commentary1': '/Commentary1', 'commentaryid': 1, 'matchid2': '10', 'matchno2': '5th ODI', 'teams2': 'AFG vs ZIM', 'matchtype2': 'ODI', 'Scorecard1': '/Scorecard1', 'status2': 'Starts on Feb 19 at 10:30 GMT'}, {'six2': '0', 'scorecardid': 2, 'overs5': '4', 'fours1': '0', 'overs10': '20', 'Batting_team_img': 'images/RSA.png', 'wickets20': '5', 'wickets6': '1', 'Bowling_team_img': 'images/IND.png', 'maidens6': '0', 'Batting team': 'RSA', 'matchid2': '9', 'name6': 'Unadkat', 'teams2': 'RSA vs IND', 'wickets10': '9', 'desc10': 'Inns', 'runs5': '32', 'matchtype2': 'T20', 'Scorecard1': '/Scorecard2', 'runs1': '2', 'wickets5': '0', 'runs6': '33', 'runs2': '0', 'maidens5': '0', 'runs20': '203', 'name5': 'Bumrah*', 'progress2': 'complete', 'Commentary1': '/Commentary2', 'fours2': '0', 'series2': 'India tour of South Africa, 2017-18', 'name1': 'Junior Dala*', 'commentaryid': 2, 'matchno2': '1st T20I', 'six1': '0', 'overs6': '4', 'Bowling team': 'IND', 'balls2': '2', 'balls1': '3', 'name2': 'Shamsi', 'overs20': '20', 'runs10': '175', 'desc20': 'Inns', 'status2': 'Ind won by 28 runs'}, {'scorecardid': 3, 'overs5': '0.4', 'fours1': '0', 'overs10': '18.4', 'Batting_team_img': 'images/BAN.png', 'wickets20': '4', 'wickets6': '1', 'Bowling_team_img': 'images/SL.png', 'Batting team': 'BAN', 'matchid2': '6', 'name6': 'Shanaka', 'teams2': 'BAN vs SL', 'wickets10': '10', 'desc10': 'Inns', 'runs5': '3', 'matchtype2': 'T20', 'Scorecard1': '/Scorecard3', 'runs1': '1', 'wickets5': '2', 'runs6': '5', 'maidens5': '0', 'runs20': '210', 'progress2': 'complete', 'Commentary1': '/Commentary3', 'name5': 'Gunathilaka*', 'series2': 'Sri Lanka tour of Bangladesh, 2018', 'name1': 'Nazmul Islam', 'commentaryid': 3, 'matchno2': '2nd T20I', 'six1': '0', 'overs6': '1.5', 'Bowling team': 'SL', 'maidens6': '0', 'balls1': '1', 'overs20': '20', 'runs10': '135', 'desc20': 'Inns', 'status2': 'SL won by 75 runs'}, {'six2': '2', 'scorecardid': 4, 'overs5': '4', 'fours1': '1', 'overs10': '20', 'Batting_team_img': 'images/NZ.png', 'wickets20': '7', 'wickets6': '1', 'Bowling_team_img': 'images/ENG.png', 'maidens6': '0', 'Batting team': 'NZ', 'matchid2': '4', 'name6': 'Tom Curran', 'teams2': 'NZ vs ENG', 'wickets10': '4', 'desc10': 'Inns', 'runs5': '41', 'matchtype2': 'T20', 'Scorecard1': '/Scorecard4', 'runs1': '7', 'wickets5': '0', 'runs6': '32', 'runs2': '37', 'maidens5': '0', 'runs20': '194', 'name5': 'Chris Jordan*', 'progress2': 'complete', 'Commentary1': '/Commentary4', 'fours2': '2', 'series2': 'England, Australia, New Zealand T20I Tri-Series, 2018', 'name1': 'de Grandhomme*', 'commentaryid': 4, 'matchno2': '6th Match', 'six1': '0', 'overs6': '3', 'Bowling team': 'ENG', 'balls2': '30', 'balls1': '5', 'name2': 'Chapman', 'overs20': '20', 'runs10': '192', 'desc20': 'Inns', 'status2': 'Eng won by 2 runs'}, {'scorecardid': 5, 'overs5': '7.4', 'fours1': '3', 'runs20': '213', 'six2': '0', 'commentaryid': 5, 'Batting team': 'SAUS', 'matchid2': '18770', 'matchno2': '21st Match', 'wickets10': '3', 'overs10': '49.4', 'matchtype2': 'TEST', 'runs1': '26', 'overs6': '8', 'runs6': '39', 'runs2': '49', 'name1': 'Mennie*', 'name5': 'Daniel Fallins*', 'series2': 'Sheffield Shield, 2017-18', 'Commentary1': '/Commentary5', 'wickets6': '1', 'runs11': '281', 'six1': '0', 'runs10': '192', 'balls1': '58', 'overs11': '74.1', 'maidens5': '1', 'desc21': '1st Inns', 'status2': 'South Aus won by 7 wkts', 'runs5': '51', 'wickets11': '10', 'desc11': '1st Inns', 'desc20': '2nd Inns', 'wickets20': '10', 'wickets21': '10', 'teams2': 'NSW vs SAUS', 'balls2': '85', 'Scorecard1': '/Scorecard5', 'wickets5': '1', 'progress2': 'Result', 'runs21': '256', 'fours2': '6', 'desc10': '2nd Inns', 'name6': 'Stobo', 'maidens6': '1', 'Bowling team': 'NSW', 'name2': 'Ferguson', 'overs20': '68.4', 'overs21': '90.4'}, {'six2': '0', 'scorecardid': 6, 'overs5': '4', 'fours1': '0', 'overs10': '20', 'Batting_team_img': 'images/RSA.png', 'wickets20': '5', 'wickets6': '1', 'Bowling_team_img': 'images/IND.png', 'maidens6': '0', 'Batting team': 'RSA', 'matchid2': '19166', 'name6': 'Unadkat', 'teams2': 'RSA vs IND', 'wickets10': '9', 'desc10': 'Inns', 'runs5': '32', 'matchtype2': 'T20', 'Scorecard1': '/Scorecard6', 'runs1': '2', 'wickets5': '0', 'runs6': '33', 'runs2': '0', 'maidens5': '0', 'runs20': '203', 'name5': 'Bumrah*', 'progress2': 'Result', 'Commentary1': '/Commentary6', 'fours2': '0', 'series2': 'India tour of South Africa, 2017-18', 'name1': 'Junior Dala*', 'commentaryid': 6, 'matchno2': '1st T20I', 'six1': '0', 'overs6': '4', 'Bowling team': 'IND', 'balls2': '2', 'balls1': '3', 'name2': 'Shamsi', 'overs20': '20', 'runs10': '175', 'desc20': 'Inns', 'status2': 'Ind won by 28 runs'}]

4 个答案:

答案 0 :(得分:4)

由于您的记录似乎没有唯一标识符来区分记录,因此您需要对所有键值对进行哈希处理。只要您的词典中没有嵌套的可变对象,此方法就可以正常工作。

我会在这里使用OrderedDict维持秩序。

from collections import OrderedDict
list(
     map(
         dict, 
         OrderedDict.fromkeys(
             map(frozenset, map(dict.items, table)), None
         )
     )
)

[{'age': '2', 'h': '5', 'man': 'tim', 'w': '40'},
 {'age': '4', 'h': '3', 'man': 'jim', 'w': '20'},
 {'age': '24', 'h': '5', 'man': 'jon', 'w': '80'},
 {'age': '7', 'h': '4', 'man': 'tto', 'w': '49'}]

这是&#39>正在进行的事情:

  1. 将每个字典转换为frozensettuple个。 frozenset可以播放。
  2. 将每个frozenset作为密钥散列到OrderedDict。重复项会自动删除。
  3. 检索密钥并转换回字典列表。
  4. 有许多方法可以重现上述算法。我使用了函数编程工具 - map - python提供的。

答案 1 :(得分:2)

如果你可以将它们哈希到一个集合中,你可以找到并删除它们。一种方法:

代码:

def remove_dupes(a_list):
    already_have = set()
    new_table = []
    for row in a_list:
        row_hashable = tuple(sorted(row.items()))
        if row_hashable not in already_have:
            new_table.append(row)
            already_have.add(row_hashable)
    return new_table

测试代码:

table = [
    {'man': 'tim', 'age': '2', 'h': '5', 'w': '40'},
    {'man': 'jim', 'age': '4', 'h': '3', 'w': '20'},
    {'man': 'jon', 'age': '24', 'h': '5', 'w': '80'},
    {'man': 'tim', 'age': '2', 'h': '5', 'w': '40'},
    {'man': 'tto', 'age': '7', 'h': '4', 'w': '49'}
]

print(remove_dupes(table))

结果:

[    
    {'man': 'tim', 'age': '2', 'h': '5', 'w': '40'}, 
    {'man': 'jim', 'age': '4', 'h': '3', 'w': '20'}, 
    {'man': 'jon', 'age': '24', 'h': '5', 'w': '80'},
    {'man': 'tto', 'age': '7', 'h': '4', 'w': '49'}
]

答案 2 :(得分:2)

list(map(dict, {tuple(sorted(t.items())):1 for t in table}.keys()))

或者,使用集合:

list(map(dict, set(tuple(sorted(t.items())) for t in table)))

以上解决方案不保持Python中的顺序< 3.6正如@cᴏʟᴅsᴘᴇᴇᴅ所指出的那样。

以下是维持秩序的解决方案:

singlev = []
for k, v in enumerate([tuple(sorted(t.items())) for t in table]):
    if v not in singlev:
        singlev.append(table[k])

答案 3 :(得分:0)

由于您的值都是可清除的,因此您可以转换为一组"元组的元组",按顺序删除重复项,然后转换回字典。

def uniqifier(seq):
    seen = set()
    seen_add = seen.add
    return (x for x in seq if not (x in seen or seen_add(x)))

[dict(i) for i in uniqifier(tuple(i.items()) for i in table)]

uniquifier功能由@MarkusJarderot提供。我做的唯一修改是使用它来返回生成器而不是列表。