groupby多列的bin大小的嵌套字典

时间:2018-04-19 15:29:11

标签: python pandas dictionary dataframe pandas-groupby

df = pd.DataFrame({'a': [1,1,1,1,2,2,2,2,3,3,3,3], 'b': [5,5,1,1,3,3,3,1,2,1,1,1,]})
>>> df
    a  b
0   1  5
1   1  5
2   1  1
3   1  1
4   2  3
5   2  3
6   2  3
7   2  1
8   3  2
9   3  1
10  3  1
11  3  1
>>> df.groupby(['a','b']).size().to_dict()
{(1, 5): 2, (3, 2): 1, (2, 3): 3, (3, 1): 3, (1, 1): 2, (2, 1): 1}

我得到的是每个ab组合与tuplekey的计数,但我想要的是:

{1: {5: 2, 1: 2}, 2: {3: 3, 1: 1}, 3: {2: 1, 1: 3} }

3 个答案:

答案 0 :(得分:2)

你需要在词典理解中增加groupby

i = df.groupby(['a','b']).size().reset_index(level=1)
j = {k : dict(g.values) for k, g in i.groupby(level=0)}

print(j)
{
    1: {1: 2, 5: 2}, 
    2: {1: 1, 3: 3}, 
    3: {1: 3, 2: 1}
}

答案 1 :(得分:2)

您可以使用collections.defaultdict作为O(n)解决方案。

from collections import defaultdict

df = pd.DataFrame({'a': [1,1,1,1,2,2,2,2,3,3,3,3], 'b': [5,5,1,1,3,3,3,1,2,1,1,1,]})**Option 2: defaultdict**

d = defaultdict(lambda: defaultdict(int))

for i, j in map(tuple, df.values):
    d[i][j] += 1

# defaultdict(<function __main__.<lambda>>,
#             {1: defaultdict(int, {1: 2, 5: 2}),
#              2: defaultdict(int, {1: 1, 3: 3}),
#              3: defaultdict(int, {1: 3, 2: 1})})

答案 2 :(得分:2)

from collections import Counter
import pandas as pd

s = pd.Series(Counter(zip(df.a, df.b)))
{
    n: d.xs(n).to_dict()
    for n, d in s.groupby(level=0)
}

{1: {1: 2, 5: 2}, 2: {1: 1, 3: 3}, 3: {1: 3, 2: 1}}