如何从字典中的字典中删除重复的条目?

时间:2018-06-17 21:29:03

标签: python python-3.x

我的代码的目标是,只有一个具有相同名称和出生日期的个人应该出现在我正在解析的文件中。

这是我在字典中的字典,名为ind:

{I19: {'BIRT': '13 FEB 1981', 'sex': 'M', 'id': 'I19', 'family': 'F23', 'name': 'Dick /Smith/'}}
{I32: {'BIRT': '27 MAY 1991', 'sex': 'M', 'id': 'I32', 'family': 'F16', 'name': 'Nick /Tary/'}}
{I30: {'BIRT': '3 SEP 1993', 'sex': 'F', 'id': 'I30', 'family': 'F16', 'name': 'Mary /Test/'}}
{I26: {'BIRT': '2 JUN 1983', 'sex': 'F', 'id': 'I26', 'family': 'F23', 'name': 'Jane /Smith/'}}
{I01: {'name': 'Joe /Smith/', 'family': 'F23', 'BIRT': '15 JUL 1960', 'sex': 'M', 'id': 'I01', 'DEAT': '31 DEC 2013'}}
{I07: {'BIRT': '23 SEP 1960', 'sex': 'F', 'id': 'I07', 'family': 'F23', 'name': 'Jennifer /Smith/'}}
{I19: {'BIRT': '13 FEB 1981', 'sex': 'M', 'id': 'I19', 'family': 'F23', 'name': 'Dick /Smith/'}}

我的代码应该摆脱Dick Smith的一个条目,因为它有2个。

到目前为止,这是我此部分的代码(不删除重复的内容):

for individual in ind:
    name1 = ind[individual]['name']
    bdate1 = ind[individual]['BIRT']
    for individual_2 in ind:
        name2 = ind[individual]['name']
        bdate2 = ind[individual]['BIRT']
        if name1 == name2 and bdate1 == bdate2:
            print("{} already exists. Removing duplicate entry.".format(name1))

但是这给了我:

Dick /Smith/ already exists. Removing duplicate entry.
Dick /Smith/ already exists. Removing duplicate entry.
Dick /Smith/ already exists. Removing duplicate entry.
Dick /Smith/ already exists. Removing duplicate entry.
Dick /Smith/ already exists. Removing duplicate entry.
Dick /Smith/ already exists. Removing duplicate entry.
Nick /Tary/ already exists. Removing duplicate entry.
Nick /Tary/ already exists. Removing duplicate entry.
Nick /Tary/ already exists. Removing duplicate entry.
Nick /Tary/ already exists. Removing duplicate entry.
Nick /Tary/ already exists. Removing duplicate entry.
Nick /Tary/ already exists. Removing duplicate entry.
Mary /Test/ already exists. Removing duplicate entry.
Mary /Test/ already exists. Removing duplicate entry.
Mary /Test/ already exists. Removing duplicate entry.
Mary /Test/ already exists. Removing duplicate entry.
Mary /Test/ already exists. Removing duplicate entry.
Mary /Test/ already exists. Removing duplicate entry.
Jane /Smith/ already exists. Removing duplicate entry.
Jane /Smith/ already exists. Removing duplicate entry.
Jane /Smith/ already exists. Removing duplicate entry.
Jane /Smith/ already exists. Removing duplicate entry.
Jane /Smith/ already exists. Removing duplicate entry.
Jane /Smith/ already exists. Removing duplicate entry.
Joe /Smith/ already exists. Removing duplicate entry.
Joe /Smith/ already exists. Removing duplicate entry.
Joe /Smith/ already exists. Removing duplicate entry.
Joe /Smith/ already exists. Removing duplicate entry.
Joe /Smith/ already exists. Removing duplicate entry.
Joe /Smith/ already exists. Removing duplicate entry.
Jennifer /Smith/ already exists. Removing duplicate entry.
Jennifer /Smith/ already exists. Removing duplicate entry.
Jennifer /Smith/ already exists. Removing duplicate entry.
Jennifer /Smith/ already exists. Removing duplicate entry.
Jennifer /Smith/ already exists. Removing duplicate entry.
Jennifer /Smith/ already exists. Removing duplicate entry.

如果问题看起来很容易道歉 - 我是新手。非常感谢任何见解。

3 个答案:

答案 0 :(得分:1)

list_of_dict = [{'I19': {'BIRT': '13 FEB 1981', 'sex': 'M', 'id': 'I19', 'family': 'F23', 'name': 'Dick /Smith/'}},
     {'I32': {'BIRT': '27 MAY 1991', 'sex': 'M', 'id': 'I32', 'family': 'F16', 'name': 'Nick /Tary/'}}
,{'I30': {'BIRT': '3 SEP 1993', 'sex': 'F', 'id': 'I30', 'family': 'F16', 'name': 'Mary /Test/'}}
,{'I26': {'BIRT': '2 JUN 1983', 'sex': 'F', 'id': 'I26', 'family': 'F23', 'name': 'Jane /Smith/'}}
,{'I01': {'name': 'Joe /Smith/', 'family': 'F23', 'BIRT': '15 JUL 1960', 'sex': 'M', 'id': 'I01', 'DEAT': '31 DEC 2013'}}
,{'I07': {'BIRT': '23 SEP 1960', 'sex': 'F', 'id': 'I07', 'family': 'F23', 'name': 'Jennifer /Smith/'}}
,{'I19': {'BIRT': '13 FEB 1981', 'sex': 'M', 'id': 'I19', 'family': 'F23', 'name': 'Dick /Smith/'}}]

new_d = {v['name'] : {k : v} for d in list_of_dict for k,v in d.items()}

for v in new_d.values():
    print(v)

输出

{'I19': {'BIRT': '13 FEB 1981', 'sex': 'M', 'id': 'I19', 'family': 'F23', 'name': 'Dick /Smith/'}}
{'I32': {'BIRT': '27 MAY 1991', 'sex': 'M', 'id': 'I32', 'family': 'F16', 'name': 'Nick /Tary/'}}
{'I30': {'BIRT': '3 SEP 1993', 'sex': 'F', 'id': 'I30', 'family': 'F16', 'name': 'Mary /Test/'}}
{'I26': {'BIRT': '2 JUN 1983', 'sex': 'F', 'id': 'I26', 'family': 'F23', 'name': 'Jane /Smith/'}}
{'I01': {'name': 'Joe /Smith/', 'family': 'F23', 'BIRT': '15 JUL 1960', 'sex': 'M', 'id': 'I01', 'DEAT': '31 DEC 2013'}}
{'I07': {'BIRT': '23 SEP 1960', 'sex': 'F', 'id': 'I07', 'family': 'F23', 'name': 'Jennifer /Smith/'}}

请注意,在此impl中,只有在复制

的情况下才会保存姓氏

答案 1 :(得分:1)

一种方法是使用标准库中提供的itertools食谱unique_everseen。如果您有权访问第三方toolz库,则可以使用toolz.unique

我们定义一个函数来确定字典是否唯一。在这种情况下,我们只需要检查每个字典的name键。

使用此技术,为每个唯一名称存储仅首次出现

from toolz import unique

res = list(unique(ind, lambda x: next(iter(x.items()))[1]['name']))

<强>设置

ind = [{'I19': {'BIRT': '13 FEB 1981', 'sex': 'M', 'id': 'I19', 'family': 'F23', 'name': 'Dick /Smith/'}},
       {'I32': {'BIRT': '27 MAY 1991', 'sex': 'M', 'id': 'I32', 'family': 'F16', 'name': 'Nick /Tary/'}},
       {'I30': {'BIRT': '3 SEP 1993', 'sex': 'F', 'id': 'I30', 'family': 'F16', 'name': 'Mary /Test/'}},
       {'I26': {'BIRT': '2 JUN 1983', 'sex': 'F', 'id': 'I26', 'family': 'F23', 'name': 'Jane /Smith/'}},
       {'I01': {'name': 'Joe /Smith/', 'family': 'F23', 'BIRT': '15 JUL 1960', 'sex': 'M', 'id': 'I01', 'DEAT': '31 DEC 2013'}},
       {'I07': {'BIRT': '23 SEP 1960', 'sex': 'F', 'id': 'I07', 'family': 'F23', 'name': 'Jennifer /Smith/'}},
       {'I19': {'BIRT': '13 FEB 1981', 'sex': 'M', 'id': 'I19', 'family': 'F23', 'name': 'Dick /Smith/'}}]

<强>结果

[{'I19': {'BIRT': '13 FEB 1981', 'sex': 'M', 'id': 'I19', 'family': 'F23', 'name': 'Dick /Smith/'}},
 {'I32': {'BIRT': '27 MAY 1991', 'sex': 'M', 'id': 'I32', 'family': 'F16', 'name': 'Nick /Tary/'}},
 {'I30': {'BIRT': '3 SEP 1993', 'sex': 'F', 'id': 'I30', 'family': 'F16', 'name': 'Mary /Test/'}},
 {'I26': {'BIRT': '2 JUN 1983', 'sex': 'F', 'id': 'I26', 'family': 'F23', 'name': 'Jane /Smith/'}},
 {'I01': {'name': 'Joe /Smith/', 'family': 'F23', 'BIRT': '15 JUL 1960', 'sex': 'M', 'id': 'I01', 'DEAT': '31 DEC 2013'}},
 {'I07': {'BIRT': '23 SEP 1960', 'sex': 'F', 'id': 'I07', 'family': 'F23', 'name': 'Jennifer /Smith/'}}]

答案 2 :(得分:0)

如果您的输入已经是整个字典,那么自从numStudents = len(maxStudents) - 1 print(("{} had the maximum attendance with, {} students").format(day, numStudents)) file.close() 出现两次后,将删除重复项。但是,如果您的数据是字典列表,则可以使用'I19'

itertools.groupby

输出:

import itertools
def depth_key(x):
  [[_, c]] = list(x.items())
  return [c['name'], c['BIRT']]

d = [{'I19': {'BIRT': '13 FEB 1981', 'sex': 'M', 'id': 'I19', 'family': 'F23', 'name': 'Dick /Smith/'}}, {'I32': {'BIRT': '27 MAY 1991', 'sex': 'M', 'id': 'I32', 'family': 'F16', 'name': 'Nick /Tary/'}}, {'I30': {'BIRT': '3 SEP 1993', 'sex': 'F', 'id': 'I30', 'family': 'F16', 'name': 'Mary /Test/'}}, {'I26': {'BIRT': '2 JUN 1983', 'sex': 'F', 'id': 'I26', 'family': 'F23', 'name': 'Jane /Smith/'}}, {'I01': {'name': 'Joe /Smith/', 'family': 'F23', 'BIRT': '15 JUL 1960', 'sex': 'M', 'id': 'I01', 'DEAT': '31 DEC 2013'}}, {'I07': {'BIRT': '23 SEP 1960', 'sex': 'F', 'id': 'I07', 'family': 'F23', 'name': 'Jennifer /Smith/'}}, {'I19': {'BIRT': '13 FEB 1981', 'sex': 'M', 'id': 'I19', 'family': 'F23', 'name': 'Dick /Smith/'}}]
new_d = [[a, list(b)] for a, b in itertools.groupby(sorted(d, key=depth_key), key=depth_key)]
final_d = [b for _, [b, *c] in new_d]