假设我有如下的数据框
仓= [0,5,10,15,20]
sex age
1 2
1 10
1 11
2 16
2 18
我试过
df.groupby([df.sex,pd.cut(df.age,仓)])。大小()
我得到的结果
sex age
1 (0,5] 1
1 (10,15] 2
2 (15,20] 2
但实际上,我想得到以下结果。
sex age
1 (0,5] 1
1 (5,10] 0
1 (10,15] 2
1 (15,20] 0
2 (0,5] 0
2 (5,10] 0
2 (10,15] 0
2 (15,20] 2
哪里是空箱?我该如何解决?
答案 0 :(得分:2)
dropna=False
stack
df.groupby([df.sex,pd.cut(df.age,bin)]).size().unstack().stack(dropna=False).fillna(0)
Out[27]:
sex age
1 (0, 5] 1.0
(5, 10] 1.0
(10, 15] 1.0
(15, 20] 0.0
2 (0, 5] 0.0
(5, 10] 0.0
(10, 15] 0.0
(15, 20] 2.0
dtype: float64
答案 1 :(得分:1)
使用unstack
和stack
In [4515]: df.groupby([df.sex, pd.cut(df.age, bin)]).size().unstack(fill_value=0).stack()
Out[4515]:
sex age
1 (0, 5] 1
(5, 10] 1
(10, 15] 1
(15, 20] 0
2 (0, 5] 0
(5, 10] 0
(10, 15] 0
(15, 20] 2
dtype: int64
或者,使用pivot_table
In [4533]: df.pivot_table(index=df.sex, columns=pd.cut(df.age, bin), aggfunc=len,
fill_value=0)['age'].stack()
Out[4533]:
sex age
1 (0, 5] 1
(5, 10] 1
(10, 15] 1
(15, 20] 0
2 (0, 5] 0
(5, 10] 0
(10, 15] 0
(15, 20] 2
dtype: int64
或者,使用reindex
In [4546]: idx = pd.MultiIndex.from_product([df.sex.unique(),
pd.cut(df.age, bin).unique()])
In [4547]: df.groupby([df.sex, pd.cut(df.age, bin)]).size().reindex(idx, fill_value=0)
Out[4547]:
1 (0, 5] 1
(5, 10] 1
(10, 15] 1
(15, 20] 0
2 (0, 5] 0
(5, 10] 0
(10, 15] 0
(15, 20] 2
dtype: int64