合并嵌套字典列表

时间:2020-02-18 08:41:53

标签: python python-3.x dictionary merge dictionary-comprehension

我有以下3个列表,这些列表是从PDF文件中提取的,

adID = ['9940542', '9940542', '10315065', '10315065', '11211744', '11211744', '11309685', '11309685', '12103490', '12103490', '12103490', '12103490', '12103490', '12103490', '12160150', '12160150']

description = ['Invalid Adjust(Platform Fee)', 'Invalid Adjust(Media Fee)', 'Invalid Adjust(Platform Fee)', 'Invalid Adjust(Media Fee)', 'Invalid Adjust(Platform Fee)', 'Invalid Adjust(Media Fee)', 'Invalid Adjust(Platform Fee)', 'Invalid Adjust(Media Fee)', 'Media Fee', 'Platform Fee', 'Invalid Adjust(Platform Fee)', 'TrueView Budget Adjust (Platofrm Fee)', 'Invalid Adjust(Media Fee)', 'TrueView Budget Adjust (Media Fee)', 'Media Fee', 'Platform Fee']

spendItem = ['-1.00', '-2.00', '-1.00', '-3.00', '-290.00', '-3403.00', '-57.00', '-670.00', '709472.00', '22703.00', '-30.00', '-301.00', '-348.00', '-9376.00', '549173.00', '17573.00']

并且我已经将这些列表转换为如下所示的词典列表

total= [{'9940542': {'Invalid Adjust(Platform Fee)': '-1.00'}},
        {'9940542': {'Invalid Adjust(Media Fee)': '-2.00'}},
        {'10315065': {'Invalid Adjust(Platform Fee)': '-1.00'}},
        {'10315065': {'Invalid Adjust(Media Fee)': '-3.00'}},
        {'11211744': {'Invalid Adjust(Platform Fee)': '-290.00'}},
        {'11211744': {'Invalid Adjust(Media Fee)': '-3403.00'}},
        {'11309685': {'Invalid Adjust(Platform Fee)': '-57.00'}},
        {'11309685': {'Invalid Adjust(Media Fee)': '-670.00'}},
        {'12103490': {'Media Fee': '709472.00'}},
        {'12103490': {'Platform Fee': '22703.00'}},
        {'12103490': {'Invalid Adjust(Platform Fee)': '-30.00'}},
        {'12103490': {'TrueView Budget Adjust (Platofrm Fee)': '-301.00'}},
        {'12103490': {'Invalid Adjust(Media Fee)': '-348.00'}},
        {'12103490': {'TrueView Budget Adjust (Media Fee)': '-9376.00'}},
        {'12160150': {'Media Fee': '549173.00'}},
        {'12160150': {'Platform Fee': '17573.00'}}]

是否有任何方法可以迭代此列表以基于adID合并键和值。如

Expected_result= {'12103490': {'Invalid Adjust(Platform Fee)': '-30.00',TrueView Budget Adjust (Platofrm Fee)': '-301.00','Invalid Adjust(Media Fee)': '-348.00','TrueView Budget Adjust (Media Fee)': '-9376.00'}}

还是有更好的方法来合并此类数据?

4 个答案:

答案 0 :(得分:3)

使用dict.setdefault和一个简单的迭代。您也可以使用collection.defaultdict

例如:

total= [{'9940542': {'Invalid Adjust(Platform Fee)': '-1.00'}},
        {'9940542': {'Invalid Adjust(Media Fee)': '-2.00'}},
        {'10315065': {'Invalid Adjust(Platform Fee)': '-1.00'}},
        {'10315065': {'Invalid Adjust(Media Fee)': '-3.00'}},
        {'11211744': {'Invalid Adjust(Platform Fee)': '-290.00'}},
        {'11211744': {'Invalid Adjust(Media Fee)': '-3403.00'}},
        {'11309685': {'Invalid Adjust(Platform Fee)': '-57.00'}},
        {'11309685': {'Invalid Adjust(Media Fee)': '-670.00'}},
        {'12103490': {'Media Fee': '709472.00'}},
        {'12103490': {'Platform Fee': '22703.00'}},
        {'12103490': {'Invalid Adjust(Platform Fee)': '-30.00'}},
        {'12103490': {'TrueView Budget Adjust (Platofrm Fee)': '-301.00'}},
        {'12103490': {'Invalid Adjust(Media Fee)': '-348.00'}},
        {'12103490': {'TrueView Budget Adjust (Media Fee)': '-9376.00'}},
        {'12160150': {'Media Fee': '549173.00'}},
        {'12160150': {'Platform Fee': '17573.00'}}]

rsult = {}
for i in total:
    for k, v in i.items():
        rsult.setdefault(k, {}).update(v)

print(rsult)

输出:

{'10315065': {'Invalid Adjust(Media Fee)': '-3.00',
              'Invalid Adjust(Platform Fee)': '-1.00'},
 '11211744': {'Invalid Adjust(Media Fee)': '-3403.00',
              'Invalid Adjust(Platform Fee)': '-290.00'},
 '11309685': {'Invalid Adjust(Media Fee)': '-670.00',
              'Invalid Adjust(Platform Fee)': '-57.00'},
 '12103490': {'Invalid Adjust(Media Fee)': '-348.00',
              'Invalid Adjust(Platform Fee)': '-30.00',
              'Media Fee': '709472.00',
              'Platform Fee': '22703.00',
              'TrueView Budget Adjust (Media Fee)': '-9376.00',
              'TrueView Budget Adjust (Platofrm Fee)': '-301.00'},
 '12160150': {'Media Fee': '549173.00', 'Platform Fee': '17573.00'},
 '9940542': {'Invalid Adjust(Media Fee)': '-2.00',
             'Invalid Adjust(Platform Fee)': '-1.00'}}

答案 1 :(得分:1)

您可以尝试一下。

In [18]: for d in total:
    ...:     for k,v in d.items():
    ...:         if k not in new:
    ...:             new[k]=v
    ...:         else:
    ...:             new[k].update(v)

输出

{'9940542': {'Invalid Adjust(Platform Fee)': '-1.00',
  'Invalid Adjust(Media Fee)': '-2.00'},
 '10315065': {'Invalid Adjust(Platform Fee)': '-1.00',
  'Invalid Adjust(Media Fee)': '-3.00'},
 '11211744': {'Invalid Adjust(Platform Fee)': '-290.00',
  'Invalid Adjust(Media Fee)': '-3403.00'},
 '11309685': {'Invalid Adjust(Platform Fee)': '-57.00',
  'Invalid Adjust(Media Fee)': '-670.00'},
 '12103490': {'Media Fee': '709472.00',
  'Platform Fee': '22703.00',
  'Invalid Adjust(Platform Fee)': '-30.00',
  'TrueView Budget Adjust (Platofrm Fee)': '-301.00',
  'Invalid Adjust(Media Fee)': '-348.00',
  'TrueView Budget Adjust (Media Fee)': '-9376.00'},
 '12160150': {'Media Fee': '549173.00', 'Platform Fee': '17573.00'}}

答案 2 :(得分:1)

您可以使用python-benedict编写一行代码,它是具有许多功能的dict子类,它是URI_ANDROID_APP_SCHEME(我是作者)。

安装:pip install python-benedict

from benedict import benedict as bdict

data_input= [
    {'9940542': {'Invalid Adjust(Platform Fee)': '-1.00'}},
    {'9940542': {'Invalid Adjust(Media Fee)': '-2.00'}},
    {'10315065': {'Invalid Adjust(Platform Fee)': '-1.00'}},
    {'10315065': {'Invalid Adjust(Media Fee)': '-3.00'}},
    {'11211744': {'Invalid Adjust(Platform Fee)': '-290.00'}},
    {'11211744': {'Invalid Adjust(Media Fee)': '-3403.00'}},
    {'11309685': {'Invalid Adjust(Platform Fee)': '-57.00'}},
    {'11309685': {'Invalid Adjust(Media Fee)': '-670.00'}},
    {'12103490': {'Media Fee': '709472.00'}},
    {'12103490': {'Platform Fee': '22703.00'}},
    {'12103490': {'Invalid Adjust(Platform Fee)': '-30.00'}},
    {'12103490': {'TrueView Budget Adjust (Platofrm Fee)': '-301.00'}},
    {'12103490': {'Invalid Adjust(Media Fee)': '-348.00'}},
    {'12103490': {'TrueView Budget Adjust (Media Fee)': '-9376.00'}},
    {'12160150': {'Media Fee': '549173.00'}},
    {'12160150': {'Platform Fee': '17573.00'}}
]

data_output = bdict()
data_output.merge(*data_input)
print(data_output.dump())

答案 3 :(得分:0)

记录下来,只有一线。 请勿在生产代码中使用(如在{Rakesh答案中那样使用setdefault

>>> adID = ['9940542', '9940542', '10315065', '10315065', '11211744', '11211744', '11309685', '11309685', '12103490', '12103490', '12103490', '12103490', '12103490', '12103490', '12160150', '12160150']
>>> description = ['Invalid Adjust(Platform Fee)', 'Invalid Adjust(Media Fee)', 'Invalid Adjust(Platform Fee)', 'Invalid Adjust(Media Fee)', 'Invalid Adjust(Platform Fee)', 'Invalid Adjust(Media Fee)', 'Invalid Adjust(Platform Fee)', 'Invalid Adjust(Media Fee)', 'Media Fee', 'Platform Fee', 'Invalid Adjust(Platform Fee)', 'TrueView Budget Adjust (Platofrm Fee)', 'Invalid Adjust(Media Fee)', 'TrueView Budget Adjust (Media Fee)', 'Media Fee', 'Platform Fee']
>>> spendItem = ['-1.00', '-2.00', '-1.00', '-3.00', '-290.00', '-3403.00', '-57.00', '-670.00', '709472.00', '22703.00', '-30.00', '-301.00', '-348.00', '-9376.00', '549173.00', '17573.00']

您可以轻松计算出total

>>> total = [{k: {u: v}} for (k, u, v) in zip(adID, description, spendItem)]
>>> total
[{'9940542': {'Invalid Adjust(Platform Fee)': '-1.00'}}, {'9940542': {'Invalid Adjust(Media Fee)': '-2.00'}}, {'10315065': {'Invalid Adjust(Platform Fee)': '-1.00'}}, {'10315065': {'Invalid Adjust(Media Fee)': '-3.00'}}, {'11211744': {'Invalid Adjust(Platform Fee)': '-290.00'}}, {'11211744': {'Invalid Adjust(Media Fee)': '-3403.00'}}, {'11309685': {'Invalid Adjust(Platform Fee)': '-57.00'}}, {'11309685': {'Invalid Adjust(Media Fee)': '-670.00'}}, {'12103490': {'Media Fee': '709472.00'}}, {'12103490': {'Platform Fee': '22703.00'}}, {'12103490': {'Invalid Adjust(Platform Fee)': '-30.00'}}, {'12103490': {'TrueView Budget Adjust (Platofrm Fee)': '-301.00'}}, {'12103490': {'Invalid Adjust(Media Fee)': '-348.00'}}, {'12103490': {'TrueView Budget Adjust (Media Fee)': '-9376.00'}}, {'12160150': {'Media Fee': '549173.00'}}, {'12160150': {'Platform Fee': '17573.00'}}]

合并列表繁琐得多:

>>> {k1: {u: v for d2 in total for k2, d2 in d2.items() for u, v in d2.items() if k2 == k1} for d1 in total for k1 in d1}
{'9940542': {'Invalid Adjust(Platform Fee)': '-1.00', 'Invalid Adjust(Media Fee)': '-2.00'}, '12103490': {'Media Fee': '709472.00', 'Platform Fee': '22703.00', 'Invalid Adjust(Platform Fee)': '-30.00', 'TrueView Budget Adjust (Platofrm Fee)': '-301.00', 'Invalid Adjust(Media Fee)': '-348.00', 'TrueView Budget Adjust (Media Fee)': '-9376.00'}, '12160150': {'Media Fee': '549173.00', 'Platform Fee': '17573.00'}, '10315065': {'Invalid Adjust(Platform Fee)': '-1.00', 'Invalid Adjust(Media Fee)': '-3.00'}, '11309685': {'Invalid Adjust(Platform Fee)': '-57.00', 'Invalid Adjust(Media Fee)': '-670.00'}, '11211744': {'Invalid Adjust(Platform Fee)': '-290.00', 'Invalid Adjust(Media Fee)': '-3403.00'}}

这非常慢,因为您需要提取键,然后找到与这些键匹配的项。

一步编写起来更容易,但实际上并不能更快:

>>> {k1: {u: v for (k2, u, v) in zip(adID, description, spendItem) if k2 == k1} for k1 in set(adID)}
{'9940542': {'Invalid Adjust(Platform Fee)': '-1.00', 'Invalid Adjust(Media Fee)': '-2.00'}, '12103490': {'Media Fee': '709472.00', 'Platform Fee': '22703.00', 'Invalid Adjust(Platform Fee)': '-30.00', 'TrueView Budget Adjust (Platofrm Fee)': '-301.00', 'Invalid Adjust(Media Fee)': '-348.00', 'TrueView Budget Adjust (Media Fee)': '-9376.00'}, '12160150': {'Media Fee': '549173.00', 'Platform Fee': '17573.00'}, '10315065': {'Invalid Adjust(Platform Fee)': '-1.00', 'Invalid Adjust(Media Fee)': '-3.00'}, '11309685': {'Invalid Adjust(Platform Fee)': '-57.00', 'Invalid Adjust(Media Fee)': '-670.00'}, '11211744': {'Invalid Adjust(Platform Fee)': '-290.00', 'Invalid Adjust(Media Fee)': '-3403.00'}}