合并两个词典并删除python中的重复项

时间:2012-03-27 13:22:02

标签: python merge

你好我有两个不同的词典。我试图通过删除重复项来合并这两个。这些是2个列表。

x = [{'relevance': 0.722, 'type': 'Company', 'name': 'Dell'}, {'relevance': 0.314, 'type': 'OperatingSystem', 'name': 'VMs'}, {'relevance': 0.122, 'type': 'Technology', 'name': 'iSCSI'}, {'relevance': 0.266, 'type': 'Company', 'name': 'Force10'}, {'relevance': 0.327, 'type': 'Person', 'name': 'Greg Althaus'}, {'relevance': 0.085, 'type': 'URL', 'name': 'http://Dell.com/OpenStack'}, {'relevance': 0.174, 'type': 'Company', 'name': 'Storage Hardware'}]
y = [{'relevance': u'0.874065', 'type': u'Company', 'name': u'Dell'}, {'relevance': u'0.522169', 'type': u'OperatingSystem', 'name': u'VMs'}, {'relevance': u'0.444586', 'type': u'Person', 'name': u'Rob Hirschfeld'}, {'relevance': u'0.413988', 'type': u'Person', 'name': u'Greg Althaus'}, {'relevance': u'0.376489', 'type': u'FieldTerminology', 'name': u'iSCSI'}, {'relevance': u'0.314059', 'type': u'Company', 'name': u'Force10'}]

我尝试过做

z = x.update(y)
print x

它给了我这个错误

AttributeError: 'str' object has no attribute 'update'`

我试过这个

z = dict(x.items() + y.items())

它给了我这个错误

AttributeError: 'str' object has no attribute 'items'

然后我试了

z = dict(x, **y)

它给了我这个错误

TypeError: type object argument after ** must be a mapping, not str

然后我试了

z = dict(chain(x.iteritems(), y.iteritems()))

它给了我这个错误

AttributeError: 'str' object has no attribute 'iteritems'

5 个答案:

答案 0 :(得分:3)

您可以将字符串中的列表转换为由名称键入的dict,然后更新:

import ast

x = "[{'relevance': 0.722, 'type': 'Company', 'name': 'Dell'}, {'relevance': 0.314, 'type': 'OperatingSystem', 'name': 'VMs'}, {'relevance': 0.122, 'type': 'Technology', 'name': 'iSCSI'}, {'relevance': 0.266, 'type': 'Company', 'name': 'Force10'}, {'relevance': 0.327, 'type': 'Person', 'name': 'Greg Althaus'}, {'relevance': 0.085, 'type': 'URL', 'name': 'http://Dell.com/OpenStack'}, {'relevance': 0.174, 'type': 'Company', 'name': 'Storage Hardware'}]"
y = "[{'relevance': u'0.874065', 'type': u'Company', 'name': u'Dell'}, {'relevance': u'0.522169', 'type': u'OperatingSystem', 'name': u'VMs'}, {'relevance': u'0.444586', 'type': u'Person', 'name': u'Rob Hirschfeld'}, {'relevance': u'0.413988', 'type': u'Person', 'name': u'Greg Althaus'}, {'relevance': u'0.376489', 'type': u'FieldTerminology', 'name': u'iSCSI'}, {'relevance': u'0.314059', 'type': u'Company', 'name': u'Force10'}]"

        # make a dictionary with the names as keys
x, y = (dict((d['name'], d) 
            # after loading the lists out of the strings safely
            for d in ast.literal_eval(lst)) 
                  # for each of the two strings
                  for lst in (x, y))
# or on Python 2.7+:
x, y = ({d['name']: d for d in ast.literal_eval(lst)} for lst in (x, y))
# combine the two dicts
x.update(y)

然后,如果你想要一个列表,那就是

x.values()

你提到你的标题中的排序。如果要按名称对该列表进行排序:

import operator
sorted(x.itervalues(), key = operator.itemgetter('name'))

答案 1 :(得分:3)

如果您希望创建一个新的词典列表并希望通过删除重复项来合并它们,这将很简单。

def DictListUpdate( lis1, lis2):
    for aLis1 in lis1:
        if aLis1 not in lis2:
            lis2.append(aLis1)
    return lis2

x = [ {"name": "surya", "company":"dell"}, \
       {"name": "jobs", "company":"apple"} ]

y = [ { "name": "surya", "company":"dell"}, \
    { "name": "gates", "company": "microsoft"} ]

print DictListUpdate(x,y)

输出:

>>> 
[{'company': 'dell', 'name': 'surya'}, {'company': 'microsoft', 'name': 'gates'}, {'company': 'apple', 'name': 'jobs'}]

答案 2 :(得分:1)

您的初始错误是因为您已将字典定义为字典列表的字符串。这背后有一个具体的推理吗?

以字符串形式这样做会非常困难。

试试这个:

x = [{'relevance': 0.722, 'type': 'Company', 'name': 'Dell'}, {'relevance': 0.314, 'type': 'OperatingSystem', 'name': 'VMs'}, {'relevance': 0.122, 'type': 'Technology', 'name': 'iSCSI'}, {'relevance': 0.266, 'type': 'Company', 'name': 'Force10'}, {'relevance': 0.327, 'type': 'Person', 'name': 'Greg Althaus'}, {'relevance': 0.085, 'type': 'URL', 'name': 'http://Dell.com/OpenStack'}, {'relevance': 0.174, 'type': 'Company', 'name': 'Storage Hardware'}]
y = [{'relevance': u'0.874065', 'type': u'Company', 'name': u'Dell'}, {'relevance': u'0.522169', 'type': u'OperatingSystem', 'name': u'VMs'}, {'relevance': u'0.444586', 'type': u'Person', 'name': u'Rob Hirschfeld'}, {'relevance': u'0.413988', 'type': u'Person', 'name': u'Greg Althaus'}, {'relevance': u'0.376489', 'type': u'FieldTerminology', 'name': u'iSCSI'}, {'relevance': u'0.314059', 'type': u'Company', 'name': u'Force10'}]

z = {}
for dic in x+y:
   z.update(dic)

print dic

答案 3 :(得分:1)

首先要注意的是,您没有两个不同的词典。您有两个不同的列表字典。第二,你不能确切地解释什么算作重复。第三是你不知道如何处理relevance密钥。

我假设两个具有等效typename键的词典是相同的,并且您希望将relevance值合并到一个列表中。然后你可以平均他们,或者其他什么。

def gen_key(d):
    return (d['name'], d['type'])

def merge_dupes(dlist):
    relevance = [float(d['relevance']) for d in dlist]
    name, type = dlist[0]['name'], dlist[0]['type']
    return {'name':name, 'type':type, 'relevance':relevance}

to_merge = {}
for l in (x, y):
    for d in l:
        to_merge.setdefault(gen_key(d), []).append(d)

# if you want another list
merged_list = [merge_dupes(l) for l in to_merge.itervalues()]

# if you'd prefer a dictionary
merged_dict = dict((k, merge_dupes(v)) for k, v in to_merge.iteritems())

输出:

>>> pprint(merged_list)
[{'name': u'Rob Hirschfeld',
  'relevance': [0.44458599999999998],
  'type': u'Person'},
 {'name': 'VMs',
  'relevance': [0.314, 0.52216899999999999],
  'type': 'OperatingSystem'},
 {'name': 'Greg Althaus',
  'relevance': [0.32700000000000001, 0.41398800000000002],
  'type': 'Person'},
 {'name': 'Storage Hardware',
  'relevance': [0.17399999999999999],
  'type': 'Company'},
 {'name': u'iSCSI',
  'relevance': [0.37648900000000002],
  'type': u'FieldTerminology'},
 {'name': 'Force10',
  'relevance': [0.26600000000000001, 0.31405899999999998],
  'type': 'Company'},
 {'name': 'http://Dell.com/OpenStack',
  'relevance': [0.085000000000000006],
  'type': 'URL'},
 {'name': 'Dell',
  'relevance': [0.72199999999999998, 0.87406499999999998],
  'type': 'Company'},
 {'name': 'iSCSI', 'relevance': [0.122], 'type': 'Technology'}]
>>> pprint(merged_dict)
{('Dell', 'Company'): {'name': 'Dell',
                       'relevance': [0.72199999999999998,
                                     0.87406499999999998],
                       'type': 'Company'},
 ('Force10', 'Company'): {'name': 'Force10',
                          'relevance': [0.26600000000000001,
                                        0.31405899999999998],
                          'type': 'Company'},
 ('Greg Althaus', 'Person'): {'name': 'Greg Althaus',
                              'relevance': [0.32700000000000001,
                                            0.41398800000000002],
                              'type': 'Person'},
 (u'Rob Hirschfeld', u'Person'): {'name': u'Rob Hirschfeld',
                                  'relevance': [0.44458599999999998],
                                  'type': u'Person'},
 ('Storage Hardware', 'Company'): {'name': 'Storage Hardware',
                                   'relevance': [0.17399999999999999],
                                   'type': 'Company'},
 ('VMs', 'OperatingSystem'): {'name': 'VMs',
                              'relevance': [0.314, 0.52216899999999999],
                              'type': 'OperatingSystem'},
 ('http://Dell.com/OpenStack', 'URL'): {'name': 'http://Dell.com/OpenStack',
                                        'relevance': [0.085000000000000006],
                                        'type': 'URL'},
 (u'iSCSI', u'FieldTerminology'): {'name': u'iSCSI',
                                   'relevance': [0.37648900000000002],
                                   'type': u'FieldTerminology'},
 ('iSCSI', 'Technology'): {'name': 'iSCSI',
                           'relevance': [0.122],
                           'type': 'Technology'}}

答案 4 :(得分:0)

感谢您的回答。我可以得到解决方案。对不起,我无法正确解释我的要求。我想根据'name'键删除重复项。

我试过这样做。它奏效了。

def DictListUpdate( lis1, lis2):
    for aLis1 in lis1:
        if aLis1 not in lis2:
            lis2.append(aLis1)
    return lis2

z = DictListUpdate(x,y)
getvals = operator.itemgetter('name')

z.sort(key=getvals)

result = []
for k, g in itertools.groupby(z, getvals):
    result.append(g.next())

z[:] = result
print(z)

输出

[{'name': 'Compute Hardware', 'relevance': '0.236', 'type': 'Company'},
 {'name': 'Dell', 'relevance': '0.874065', 'type': 'Company'},
 {'name': 'Force10', 'relevance': '0.314059', 'type': 'Company'},
 {'name': 'Greg Althaus', 'relevance': '0.413988', 'type': 'Person'},
 {'name': 'Need to administrative infrastructure',
  'relevance': '0.292',
  'type': 'IndustryTerm'},
 {'name': 'Nova Volume', 'relevance': '0.101', 'type': 'Person'},
 {'name': 'RAM', 'relevance': '0.363781', 'type': 'Technology'},
 {'name': 'Rob Hirschfeld', 'relevance': '0.444586', 'type': 'Person'},
 {'name': 'Storage Hardware', 'relevance': '0.174', 'type': 'Company'},
 {'name': 'VMs', 'relevance': '0.522169', 'type': 'OperatingSystem'},
 {'name': 'http://Dell.com/OpenStack', 'relevance': '0.085', 'type': 'URL'},
 {'name': 'http://RobHirschfeld.com', 'relevance': '0.073', 'type': 'URL'},
 {'name': 'iSCSI', 'relevance': '0.376489', 'type': 'FieldTerminology'}]