Question

我在Python中面临一个问题，即以有效的方式创建来自Pandas数据帧的字典字典。这是我的DF。

             User-ID  Book-Rating
ISBN                            
0553297627   230402     1
0553297627   124942     7
0553297627   238120     0
0553297627   227705     2
0553297627   234623     10
0553297627   172742     5

我想要一个这样的结构：

{
'0553297627': {
                '230402': 1, 
                '124942': 7, 
                '238120': 0, 
                '227705': 2, 
                '234623': 10
                '172742': 5,
             }
... more books here
}

我是用循环做的，这非常耗时。我的代码是：

...
isbn = '0553297627'
df_values = df.values
d = {key: value for (key, value) in df_values}  <--- I want to avoid!
dict[isbn] = d

Answer 1

字典理解基于set_index + groupby + xs

{name: group.xs(name).to_dict()
 for name, group in df.set_index('User-ID', append=True).groupby(level=0)}

{553297627: {'Book-Rating': {124942: 7,
   172742: 5,
   227705: 2,
   230402: 1,
   234623: 10,
   238120: 0}}}

使用defaultdict + iterrows

from collections import defaultdict

d = defaultdict(dict)
for i, row in df.iterrows():
    d[i][row['User-ID']] = row['Book-Rating']

dict(d)

时间测试

Answer 2

您可以将groupby与zip一起使用，最后转换为to_dict：

print (df.groupby(level='ISBN')
         .apply(lambda x: dict(zip(x['User-ID'], x['Book-Rating'])))
         .to_dict())
{
    553297627:
    {230402: 1, 172742: 5, 238120: 0, 227705: 2, 124942: 7, 234623: 10}
}

在没有循环的情况下从Pandas数据框创建包含字典的字典

2 个答案: