Python Pandas:将我的索引聚合到一个年龄范围

时间:2015-10-09 17:38:34

标签: python pandas

我有一个如下所示的DataFrame:

    age  gender  count
0    10  Female      1
1    10    Male      1
2    12  Female      2
3    13  Female      3
4    13    Male      2
5    14  Female      1
6    14    Male     10
7    15  Female      9
8    15    Male     12
9    16  Female      8
10   16    Male     24
11   17  Female      7
12   17    Male     16
13   18  Female      6
14   18    Male      3
15   19  Female      2
16   19    Male      1
17   20    Male      1
18   21  Female      1
19   22    Male      2
20   23    Male      1

我想将一些年龄聚集在一起。像这样:

    age     gender  count
0    10     Female      1
1    10       Male      1
2    12     Female      2
3    13     Female      3
4    13       Male      2
5    14     Female      1
6    14       Male     10
7    15     Female      9
8    15       Male     12
9    16     Female      8
10   16       Male     24
11   17-19  Female     15
12   17-19    Male     20
17   20-23    Male      4
18   20-23  Female      1

到目前为止,我已经制作了垃圾箱,然后用pd.cut对它们进行了攻击(?这里的正确用语),然后对它们进行分组,如下所示:

bins = np.array([8,9,10,11,12,13,14,15,16,17,20,25,30...]) #these bins don't reflect the example I provided
groups = df.groupby(pd.cut(df.age, bins))

但是,我无法从这些组中提取正确的数据框,我觉得它很接近,但我不知道如何继续。当我groups.first()groups.last()时,我可以看到我想要的信息就在那里,只是模糊不清。有什么建议吗?

1 个答案:

答案 0 :(得分:2)

您想要groupby性别以及年龄段。使用sum汇总并删除空行(dropna)以获得所需内容。

groups = df.groupby((pd.cut(df.age, bins), 'gender'))
output = groups.sum().dropna()