解决方案

Question

我们假设我在Pandas数据框中有如下数据：

我想找到以下的描述性统计数据（平均值，中位数，标准值）：

每个群组的唯一身份用户
每个用户每个群组的评论
每个群组的评论

因此，对于输出，我希望看到：

每个群组的唯一身份用户 - ＆gt; [{a：3}，{b：2}，...]然后查找系列的描述性统计信息
每个群组的每个用户评论 - ＆gt; [{（一，亚历克斯）：2}，{（B，亚历克斯）：0}，{（A，贝斯）：1}，{（B，贝斯）：3} ...]
每个群组的评论 - ＆gt; [{a：5}，{b：6} ...]

我使用熊猫，我绝对坚持如何做这么简单的事情。我在考虑使用.groupby()，但这并没有产生明确的解决方案。我可以在没有熊猫的情况下做到这一点，但我认为这些都是Pandas数据帧的问题！？

谢谢！

Answer 1

解决方案

您可以使用

df.groupby(['Cohort', 'User']).describe()

或

df.groupby(['Cohort']).describe()

根据您想要的输出

df.groupby(['Cohort'])['User'].apply(lambda x: x.describe().ix['unique'])

和

df.groupby(['Cohort', 'User'])['Comment'].apply(lambda x: x.describe().ix['unique'])

和

df.groupby(['Cohort'])['Comment'].apply(lambda x: x.describe().ix['unique'])

Answer 2

>>> df.groupby('Cohort').User.apply(lambda group: group.unique())
Cohort
a    [alex, beth, craig]
b          [beth, craig]
Name: User, dtype: object

>>> df.groupby('Cohort').User.apply(lambda group: group.nunique())
Out[40]: 
Cohort
a    3
b    2
Name: User, dtype: int64

>>> df.groupby(['Cohort', 'User']).Comment.count()
Out[43]: 
Cohort  User 
a       alex     2
        beth     1
        craig    2
b       beth     3
        craig    3
Name: Comment, dtype: int64

df.groupby(['Cohort']).Comment.count()
Out[44]: 
Cohort
a    5
b    6
Name: Comment, dtype: int64

Pandas查找多个层次结构平均值

2 个答案:

解决方案

根据您想要的输出