Question

我有一个数据帧df，其外观如下：

        a    b    c    d
0       8    xx   17   1.0  
1       8    xy   19   1.0 
2       8    zz   13   0.0
3       9    tt   8    5.0

我正在尝试创建一个包含带有元组列表的键的字典如下：

{8:[(17,1.0),(19,1.0),(13,0.0)], 9:[(8,5.0)]}

这里，密钥来自列a，元组列表是列c和列d，其中密钥为a。我也在其他数据集上应用它并尝试了

df_new = df.groupby(['a'])[['c','d']).apply(lambda x: [tuple(x) for x in x.values])

但是，我一直收到错误

raise TypeError('Series.name must be a hashable type')
TypeError: Series.name must be a hashable type

我尝试删除群组中的[＆＃39; a＆＃39;]并将其保留为＆＃39; a＆＃39;如下：

df_new = df.groupby('a')[['c','d']).apply(lambda x: [tuple(x) for x in x.values])

但是，我得到了同样的错误：

raise TypeError('Series.name must be a hashable type')
TypeError: Series.name must be a hashable type

我不想在原始数据帧df中使所有内容都不可变。我希望保持原样。

有没有办法使用pandas功能实现这一目标？我真的不想制作列表，然后通过索引将它们压缩在一起并创建一个字典。

Answer 1

使用defaultdict

from collections import defaultdict

d = defaultdict(list)
for tup in df.itertuples():
    d[tup.a].append((tup.c, tup.d))

dict(d)

{8: [(17, 1.0), (19, 1.0), (13, 0.0)], 9: [(8, 5.0)]}

*使用to_dict和groupby *

df.set_index(['c', 'd']).groupby('a').apply(lambda df: df.index.tolist()).to_dict()

{8: [(17, 1.0), (19, 1.0), (13, 0.0)], 9: [(8, 5.0)]}

Answer 2

只是另一个轻微的变化

df.set_index('a')[['c', 'd']]\
  .apply(tuple, 1)\
  .groupby(level=0)\
  .apply(list)\
  .to_dict()

{8: [(17, 1), (19, 1), (13, 0)], 9: [(8, 5)]}

Answer 3

我认为这是错误，但与apply合作zip：

df = pd.DataFrame({'d': [1.0, 1.0, 0.0, 5.0], 
                   'b': ['xx', 'xy', 'zz', 'tt'], 
                   'a': [8, 8, 8, 9], 
                   'c': [17, 19, 13, 8]})
print (df)
   a   b   c    d
0  8  xx  17  1.0
1  8  xy  19  1.0
2  8  zz  13  0.0
3  9  tt   8  5.0

df_new = df.groupby(['a']).apply(lambda x: list(zip(x.c, x.d))).to_dict()
print (df_new)
{8: [(17, 1.0), (19, 1.0), (13, 0.0)], 9: [(8, 5.0)]}

对我来说，你的版本适用于（有小错字，)已更改为]）：

df_new = df.groupby('a')[['c','d']].apply(lambda x: [tuple(x) for x in x.values]).to_dict()
print (df_new)
{8: [(17.0, 1.0), (19.0, 1.0), (13.0, 0.0)], 9: [(8.0, 5.0)]}

Answer 4

您可以使用词典理解：

{k: list(map(tuple, g[['c','d']].values)) for k, g in df.groupby('a')}
# {8: [(17, 1), (19, 1), (13, 0)], 9: [(8, 5)]}

或另一种方式：

dict((k, list(map(tuple, g[['c','d']].values))) for k, g in df.groupby('a'))

数据帧到按键分组的元组列表的字典

4 个答案: