Pandas groupby与bin计数

时间:2015-12-16 16:26:14

标签: python pandas dataframe pandas-groupby

我有一个如下所示的DataFrame:

<PropertyGroup>
    <PostBuildEvent Condition="'$(Platform)' == 'x64'">
        copy "$(ProjectDir)\deps\x64\*.*" "$(TargetDir)"
    </PostBuildEvent>
    <PostBuildEvent Condition="'$(Platform)' == 'x86'">
        copy "$(ProjectDir)\deps\x86\*.*" "$(TargetDir)"
    </PostBuildEvent>
</PropertyGroup>

我希望将其转换为计算属于某些二进制文件的视图:

+----------+---------+-------+
| username | post_id | views |
+----------+---------+-------+
| john     |       1 |     3 |
| john     |       2 |    23 |
| john     |       3 |    44 |
| john     |       4 |    82 |
| jane     |       7 |     5 |
| jane     |       8 |    25 |
| jane     |       9 |    46 |
| jane     |      10 |    56 |
+----------+---------+-------+

我试过了:

+------+------+-------+-------+--------+
|      | 1-10 | 11-25 | 25-50 | 51-100 |
+------+------+-------+-------+--------+
| john |    1 |     1 |     1 |      1 |
| jane |    1 |     1 |     1 |      1 |
+------+------+-------+-------+--------+

但它只提供聚合计数而不是用户计数。如何按用户获取bin计数?

聚合计数(使用我的实际数据)如下所示:

bins = [1, 10, 25, 50, 100]
groups = df.groupby(pd.cut(df.views, bins))
groups.username.count()

1 个答案:

答案 0 :(得分:25)

您可以按容器用户名进行分组,计算组大小,然后使用unstack()

>>> groups = df.groupby(['username', pd.cut(df.views, bins)])
>>> groups.size().unstack()
views     (1, 10]  (10, 25]  (25, 50]  (50, 100]
username
jane            1         1         1          1
john            1         1         1          1