我有此词典列表,我正在尝试将列表中的重复词典合并
下面是重复词典列表的示例
[
{
"userName": "Kevin",
"status": "Disabled",
"notificationType": "Sms and Email",
"escalationLevel": "High",
"dateCreated": "2019-11-08T12:19:05.373Z"
},
{
"userName": "Kevin",
"status": "Active",
"notificationType": "Sms and Email",
"escalationLevel": "Low",
"dateCreated": "2019-11-08T12:19:05.554Z"
},
{
"userName": "Kevin",
"status": "Active",
"notificationType": "Sms",
"escalationLevel": "Medium",
"dateCreated": "2019-11-08T12:19:05.719Z"
},
{
"userName": "Ercy",
"status": "Active",
"notificationType": "Sms",
"escalationLevel": "Low",
"dateCreated": "2019-11-11T11:43:24.529Z"
},
{
"userName": "Ercy",
"status": "Active",
"notificationType": "Email",
"escalationLevel": "Medium",
"dateCreated": "2019-11-11T11:43:24.674Z"
},
{
"userName": "Samuel",
"status": "Active",
"notificationType": "Sms",
"escalationLevel": "Low",
"dateCreated": "2019-12-04T11:10:09.307Z"
},
{
"userName": "Samuel",
"status": "Active",
"notificationType": "Sms",
"escalationLevel": "High",
"dateCreated": "2019-12-05T09:12:16.778Z"
}
]
我想合并重复的字典,保留重复键的值,并添加类似的内容
[
{
"userName": "Kevin",
"status": ["Disabled","Active", "Active"]
"notificationType": ["Sms and Email", "Sms and Email", "Sms"]
"escalationLevel": ["High", "Low", "Medium"]
"dateCreated": "2019-11-08T12:19:05.373Z"
},
{
"userName": "Ercy",
"status": "Active",
"notificationType": "Sms and Email",
"escalationLevel": "Low",
"dateCreated": "2019-11-08T12:19:05.554Z"
},
{
"userName": "Samuel",
"status": ["Active", "Active"],
"notificationType": ["Sms", "Sms"],
"escalationLevel": ["Low", "High"],
"dateCreated": "2019-12-04T11:10:09.307Z"
},
]
任何实现此目的的简单方法的人,请分享您的解决方案。
答案 0 :(得分:0)
可以按照将用户(userName
)记录的长形式表示转换为宽形式的形式来重新定义此任务。为避免类型异质性,无论是否存在重复项,我们都会将所有词典提升为相同类型,即
userName: str,
status: List[str],
notificationType: List[str],
escalationLevel: List[str],
dateCreated: List[str]
尽管这与您的示例相反,但为了保持一致性,我将累积dateCreated
值。
from itertools import groupby, imap
import operator as op
USERNAME = 'userName'
def lift_long_user_record(record):
"""
:param record: a long-form user record
:type record: Dict[str, str]
"""
return {
key: value if key == USERNAME else [value]
for key, value in record.iteritems()
}
def merge_short_user_records(rec_a, rec_b):
"""
Merge two short-form records
"""
# make sure the keys match
assert set(rec_a.keys()) == set(rec_b.keys())
# make sure users match
assert rec_a[USERNAME] == rec_b[USERNAME]
user = rec_a[USERNAME]
return {
key: rec_a[USERNAME] if key == USERNAME else rec_a[key] + rec_b[key]
for key in set(rec_a.keys())
}
# the data from your example
records = [
{
"userName": "Kevin",
"status": "Disabled",
"notificationType": "Sms and Email",
"escalationLevel": "High",
"dateCreated": "2019-11-08T12:19:05.373Z"
},
...
]
groups = groupby(
sorted(imap(lift_long_user_record, records), key=op.itemgetter(USERNAME)),
op.itemgetter(USERNAME)
)
merged = [
reduce(merge_short_user_records, grp) for _, grp in groups
]
输出
[{'dateCreated': ['2019-11-11T11:43:24.529Z', '2019-11-11T11:43:24.674Z'],
'escalationLevel': ['Low', 'Medium'],
'notificationType': ['Sms', 'Email'],
'status': ['Active', 'Active'],
'userName': 'Ercy'},
{'dateCreated': ['2019-11-08T12:19:05.373Z',
'2019-11-08T12:19:05.554Z',
'2019-11-08T12:19:05.719Z'],
'escalationLevel': ['High', 'Low', 'Medium'],
'notificationType': ['Sms and Email', 'Sms and Email', 'Sms'],
'status': ['Disabled', 'Active', 'Active'],
'userName': 'Kevin'},
{'dateCreated': ['2019-12-04T11:10:09.307Z', '2019-12-05T09:12:16.778Z'],
'escalationLevel': ['Low', 'High'],
'notificationType': ['Sms', 'Sms'],
'status': ['Active', 'Active'],
'userName': 'Samuel'}]
答案 1 :(得分:0)
使用pandas
相当容易。
import pandas as pd
def update_dict(userName, d):
d['userName'] = userName
return d
In []:
df = pd.DataFrame(data)
[update_dict(k, g.to_dict(orient='list')) for k, g in df.groupby(df.userName)]
Out[]:
[{'userName': 'Ercy',
'dateCreated': ['2019-11-11T11:43:24.529Z', '2019-11-11T11:43:24.674Z'],
'escalationLevel': ['Low', 'Medium'],
'notificationType': ['Sms', 'Email'],
'status': ['Active', 'Active']},
{'userName': 'Kevin',
'dateCreated': ['2019-11-08T12:19:05.373Z', '2019-11-08T12:19:05.554Z', '2019-11-08T12:19:05.719Z'],
'escalationLevel': ['High', 'Low', 'Medium'],
'notificationType': ['Sms and Email', 'Sms and Email', 'Sms'],
'status': ['Disabled', 'Active', 'Active']},
{'userName': 'Samuel',
'dateCreated': ['2019-12-04T11:10:09.307Z', '2019-12-05T09:12:16.778Z'],
'escalationLevel': ['Low', 'High'],
'notificationType': ['Sms', 'Sms'],
'status': ['Active', 'Active']}]
在Py3.5 +中,您可以通过一些其他的奥秘取消使用辅助功能:
[{**g.to_dict(orient='list'), **{'userName': k}} for k, g in df.groupby('userName')]