Question

我有此词典列表，我正在尝试将列表中的重复词典合并
下面是重复词典列表的示例

[
            {
                "userName": "Kevin",
                "status": "Disabled",
                "notificationType": "Sms and Email",
                "escalationLevel": "High",
                "dateCreated": "2019-11-08T12:19:05.373Z"
            },
            {
                "userName": "Kevin",
                "status": "Active",
                "notificationType": "Sms and Email",
                "escalationLevel": "Low",
                "dateCreated": "2019-11-08T12:19:05.554Z"
            },
            {
                "userName": "Kevin",
                "status": "Active",
                "notificationType": "Sms",
                "escalationLevel": "Medium",
                "dateCreated": "2019-11-08T12:19:05.719Z"
            },
            {
                "userName": "Ercy",
                "status": "Active",
                "notificationType": "Sms",
                "escalationLevel": "Low",
                "dateCreated": "2019-11-11T11:43:24.529Z"
            },
            {
                "userName": "Ercy",
                "status": "Active",
                "notificationType": "Email",
                "escalationLevel": "Medium",
                "dateCreated": "2019-11-11T11:43:24.674Z"
            },
            {
                "userName": "Samuel",
                "status": "Active",
                "notificationType": "Sms",
                "escalationLevel": "Low",
                "dateCreated": "2019-12-04T11:10:09.307Z"
            },
            {
                "userName": "Samuel",
                "status": "Active",
                "notificationType": "Sms",
                "escalationLevel": "High",
                "dateCreated": "2019-12-05T09:12:16.778Z"
            }
        ]

我想合并重复的字典，保留重复键的值，并添加类似的内容

[
            {
                "userName": "Kevin",
                "status": ["Disabled","Active", "Active"]
                "notificationType": ["Sms and Email", "Sms and Email", "Sms"]
                "escalationLevel": ["High", "Low", "Medium"]
                "dateCreated": "2019-11-08T12:19:05.373Z"
            },
            {
                "userName": "Ercy",
                "status": "Active",
                "notificationType": "Sms and Email",
                "escalationLevel": "Low",
                "dateCreated": "2019-11-08T12:19:05.554Z"
            },
            {
                "userName": "Samuel",
                "status": ["Active", "Active"],
                "notificationType": ["Sms", "Sms"],
                "escalationLevel": ["Low", "High"],
                "dateCreated": "2019-12-04T11:10:09.307Z"
            },

        ]

任何实现此目的的简单方法的人，请分享您的解决方案。

Answer 1

可以按照将用户（userName）记录的长形式表示转换为宽形式的形式来重新定义此任务。为避免类型异质性，无论是否存在重复项，我们都会将所有词典提升为相同类型，即

userName: str,
status: List[str],
notificationType: List[str],
escalationLevel: List[str],
dateCreated: List[str]

尽管这与您的示例相反，但为了保持一致性，我将累积dateCreated值。

from itertools import groupby, imap
import operator as op


USERNAME = 'userName'


def lift_long_user_record(record):
    """
    :param record: a long-form user record
    :type record: Dict[str, str]
    """
    return {
        key: value if key == USERNAME else [value] 
        for key, value in record.iteritems()
    }


def merge_short_user_records(rec_a, rec_b):
    """
    Merge two short-form records
    """
    # make sure the keys match
    assert set(rec_a.keys()) == set(rec_b.keys())
    # make sure users match
    assert rec_a[USERNAME] == rec_b[USERNAME]
    user = rec_a[USERNAME]
    return {
        key: rec_a[USERNAME] if key == USERNAME else rec_a[key] + rec_b[key]
        for key in set(rec_a.keys())
    }


# the data from your example
records = [
    {
        "userName": "Kevin",
        "status": "Disabled",
        "notificationType": "Sms and Email",
        "escalationLevel": "High",
        "dateCreated": "2019-11-08T12:19:05.373Z"
    },
    ...
]


groups = groupby(
    sorted(imap(lift_long_user_record, records), key=op.itemgetter(USERNAME)),
    op.itemgetter(USERNAME)
)

merged = [
    reduce(merge_short_user_records, grp) for _, grp in groups
]

输出

[{'dateCreated': ['2019-11-11T11:43:24.529Z', '2019-11-11T11:43:24.674Z'],
  'escalationLevel': ['Low', 'Medium'],
  'notificationType': ['Sms', 'Email'],
  'status': ['Active', 'Active'],
  'userName': 'Ercy'},
 {'dateCreated': ['2019-11-08T12:19:05.373Z',
   '2019-11-08T12:19:05.554Z',
   '2019-11-08T12:19:05.719Z'],
  'escalationLevel': ['High', 'Low', 'Medium'],
  'notificationType': ['Sms and Email', 'Sms and Email', 'Sms'],
  'status': ['Disabled', 'Active', 'Active'],
  'userName': 'Kevin'},
 {'dateCreated': ['2019-12-04T11:10:09.307Z', '2019-12-05T09:12:16.778Z'],
  'escalationLevel': ['Low', 'High'],
  'notificationType': ['Sms', 'Sms'],
  'status': ['Active', 'Active'],
  'userName': 'Samuel'}]

Answer 2

使用pandas相当容易。

import pandas as pd

def update_dict(userName, d):
    d['userName'] = userName
    return d

In []:
df = pd.DataFrame(data)
[update_dict(k, g.to_dict(orient='list')) for k, g in df.groupby(df.userName)]

Out[]:
[{'userName': 'Ercy',
  'dateCreated': ['2019-11-11T11:43:24.529Z', '2019-11-11T11:43:24.674Z'],
  'escalationLevel': ['Low', 'Medium'],
  'notificationType': ['Sms', 'Email'],
  'status': ['Active', 'Active']},
 {'userName': 'Kevin',
  'dateCreated': ['2019-11-08T12:19:05.373Z', '2019-11-08T12:19:05.554Z', '2019-11-08T12:19:05.719Z'],
  'escalationLevel': ['High', 'Low', 'Medium'],
  'notificationType': ['Sms and Email', 'Sms and Email', 'Sms'],
  'status': ['Disabled', 'Active', 'Active']},
 {'userName': 'Samuel',
  'dateCreated': ['2019-12-04T11:10:09.307Z', '2019-12-05T09:12:16.778Z'],
  'escalationLevel': ['Low', 'High'],
  'notificationType': ['Sms', 'Sms'],
  'status': ['Active', 'Active']}]

在Py3.5 +中，您可以通过一些其他的奥秘取消使用辅助功能：

[{**g.to_dict(orient='list'), **{'userName': k}} for k, g in df.groupby('userName')]

合并列表中的重复字典

2 个答案: