我正试图从没有重复的数据框中提取字典。
以下是数据框:
{'Country': {0: 'Japan', 1: 'China', 2: 'USA', 3: 'Russia', 4: 'Japan',
5: 'Japan', 6: 'China'}, 'Port': {0: 'Yokohama', 1: 'Ningbo', 2:
'Baltimore', 3: 'Moscow', 4: 'Tokyo', 5: 'Tokyo', 6: 'Shanghai'}}
我将国家/地区设置为键并删除了重复项。现在,我需要从列表中删除重复项
import pandas as pd
a ={'Country': {0: 'Japan', 1: 'China', 2: 'USA', 3: 'Russia', 4: 'Japan',
5: 'Japan', 6: 'China'}, 'Port': {0: 'Yokohama', 1: 'Ningbo', 2:
'Baltimore', 3: 'Moscow', 4: 'Tokyo', 5: 'Tokyo', 6: 'Shanghai'}}
a_dict=a.groupby(['Country'])['Port'].apply(list).to_dict()
print(a_dict)
输出:
{'China': ['Ningbo', 'Shanghai'], 'Japan': ['Yokohama', 'Tokyo',
'Tokyo'], 'Russia': ['Moscow'], 'USA': ['Baltimore']}
预期输出:
{'China': ['Ningbo', 'Shanghai'], 'Japan': ['Yokohama', 'Tokyo'],
'Russia': ['Moscow'], 'USA': ['Baltimore']}
答案 0 :(得分:1)
使用drop_duplicates
和您的代码:
d = df.drop_duplicates().groupby(['Country'])['Port'].apply(list).to_dict()
print(d)
{'China': ['Ningbo', 'Shanghai'], 'Japan': ['Yokohama', 'Tokyo'],
'Russia': ['Moscow'], 'USA': ['Baltimore']}
答案 1 :(得分:1)
GroupBy.apply
与set
df.groupby('Country')['Port'].apply(set).map(list).to_dict()
如果您不在乎输出是列表或集合的字典,则可以简化为
df.groupby('Country')['Port'].apply(set).to_dict()
defaultdict
from collections import defaultdict
d = defaultdict(set)
for c, p in zip(df['Country'], df['Port']):
d[c].add(p)
{k: list(v) for k, v in d.items()}