Question

这段代码看起来确实很愚蠢，但这是我整天都在处理的问题的基本表示-我有3列，分别是类型，日期和月份。我想按天计算狗/猫的数量，然后将其平均化为一个月。

import numpy as np
import pandas as pd

data = {'Type':['Dog', 'Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog', 'Cat'], 'Day':[1, 1, 2, 2, 3, 3, 4, 4], 'Month': [1, 1, 1, 1, 2, 2, 2, 2]}
newDF = pd.DataFrame(data)

这将创建一个如下所示的数据框：

Type|Day|Month
---------
Dog|1|1
Cat|1|1
Cat|2|1
Cat|2|1
Dog|3|2
Dog|3|2
Dog|4|2
Cat|4|2

我要在这里做的是在下面创建一个显示此表的表：

Type | Month1 | Month2
------------------------

Dog  |   1    |   1.5

Cat  |   1.5  |    1

所以基本上，我只想使用数据透视表或groupby的某种组合来创建一个数据透视表，其中包含每天猫/狗的数量计数，然后将其平均一个月。由于某些原因，我只是无法弄清楚。有足够聪明的熊猫人可以帮忙吗？谢谢！

Answer 1

两个groupbys + unstack

(newDF.groupby(['Type', 'Day', 'Month']).size()
      .groupby(level=[0,2]).mean()
      .unstack()
      .add_prefix('Month').rename_axis(None, 1))

输出：

      Month1  Month2
Type                
Cat      1.5     1.0
Dog      1.0     1.5

Answer 2

仅将groupby与unstack和mean组合在一起：

df.groupby(df.columns.tolist()) \ 
  .size() \
  .unstack(level='Day') \
  .mean(axis=1) \
  .unstack(level='Month')

输出：

Month    1    2
Type           
Cat    1.5  1.0
Dog    1.0  1.5

如何使用pandas数据框计算列的平均数？

2 个答案: