Groupby处理python中的空bin

时间:2017-10-20 02:30:33

标签: python pandas dataframe

假设我有如下的数据框

仓= [0,5,10,15,20]

sex age
1    2
1   10 
1   11 
2   16
2   18

我试过

df.groupby([df.sex,pd.cut(df.age,仓)])。大小()

我得到的结果

sex        age
1   (0,5]  1
1 (10,15]  2
2 (15,20]  2

但实际上,我想得到以下结果。

sex      age
1   (0,5] 1
1  (5,10] 0
1 (10,15] 2
1 (15,20] 0
2   (0,5] 0
2  (5,10] 0
2 (10,15] 0
2 (15,20] 2

哪里是空箱?我该如何解决?

2 个答案:

答案 0 :(得分:2)

dropna=False

中有stack
df.groupby([df.sex,pd.cut(df.age,bin)]).size().unstack().stack(dropna=False).fillna(0)
Out[27]: 
sex  age     
1    (0, 5]      1.0
     (5, 10]     1.0
     (10, 15]    1.0
     (15, 20]    0.0
2    (0, 5]      0.0
     (5, 10]     0.0
     (10, 15]    0.0
     (15, 20]    2.0
dtype: float64

答案 1 :(得分:1)

使用unstackstack

In [4515]: df.groupby([df.sex, pd.cut(df.age, bin)]).size().unstack(fill_value=0).stack()
Out[4515]:
sex  age
1    (0, 5]      1
     (5, 10]     1
     (10, 15]    1
     (15, 20]    0
2    (0, 5]      0
     (5, 10]     0
     (10, 15]    0
     (15, 20]    2
dtype: int64

或者,使用pivot_table

In [4533]: df.pivot_table(index=df.sex, columns=pd.cut(df.age, bin), aggfunc=len,
                          fill_value=0)['age'].stack()
Out[4533]:
sex  age
1    (0, 5]      1
     (5, 10]     1
     (10, 15]    1
     (15, 20]    0
2    (0, 5]      0
     (5, 10]     0
     (10, 15]    0
     (15, 20]    2
dtype: int64

或者,使用reindex

In [4546]: idx = pd.MultiIndex.from_product([df.sex.unique(), 
                                             pd.cut(df.age, bin).unique()])

In [4547]: df.groupby([df.sex, pd.cut(df.age, bin)]).size().reindex(idx, fill_value=0)
Out[4547]:
1  (0, 5]      1
   (5, 10]     1
   (10, 15]    1
   (15, 20]    0
2  (0, 5]      0
   (5, 10]     0
   (10, 15]    0
   (15, 20]    2
dtype: int64