我有一个如下所示的DataFrame:
age gender count
0 10 Female 1
1 10 Male 1
2 12 Female 2
3 13 Female 3
4 13 Male 2
5 14 Female 1
6 14 Male 10
7 15 Female 9
8 15 Male 12
9 16 Female 8
10 16 Male 24
11 17 Female 7
12 17 Male 16
13 18 Female 6
14 18 Male 3
15 19 Female 2
16 19 Male 1
17 20 Male 1
18 21 Female 1
19 22 Male 2
20 23 Male 1
我想将一些年龄聚集在一起。像这样:
age gender count
0 10 Female 1
1 10 Male 1
2 12 Female 2
3 13 Female 3
4 13 Male 2
5 14 Female 1
6 14 Male 10
7 15 Female 9
8 15 Male 12
9 16 Female 8
10 16 Male 24
11 17-19 Female 15
12 17-19 Male 20
17 20-23 Male 4
18 20-23 Female 1
到目前为止,我已经制作了垃圾箱,然后用pd.cut对它们进行了攻击(?这里的正确用语),然后对它们进行分组,如下所示:
bins = np.array([8,9,10,11,12,13,14,15,16,17,20,25,30...]) #these bins don't reflect the example I provided
groups = df.groupby(pd.cut(df.age, bins))
但是,我无法从这些组中提取正确的数据框,我觉得它很接近,但我不知道如何继续。当我groups.first()
和groups.last()
时,我可以看到我想要的信息就在那里,只是模糊不清。有什么建议吗?
答案 0 :(得分:2)
您想要groupby
性别以及年龄段。使用sum
汇总并删除空行(dropna
)以获得所需内容。
groups = df.groupby((pd.cut(df.age, bins), 'gender'))
output = groups.sum().dropna()