Question

我有以下时间序列：

        Date        Value
0       2006-01-03  18
1       2006-01-04  12
2       2006-01-05  11
3       2006-01-06  10
4       2006-01-09  22
...     ...     ...
3510    2019-12-23  47
3511    2019-12-24  46
3512    2019-12-26  35
3513    2019-12-27  35
3514    2019-12-30  28

我想计算每月的平均值。所以每个月的伪代码如下：

将该月中每一天的所有值相加
除以包含该月数据的天数。

所需的输出类似于：

        Date        Value
0       2006-01     17.45
1       2006-02     18.23
2       2006-04     16.79
3       2006-05     17.98
...     ...     ...
166     2019-11     37.89
167     2019-12     36.34

我试过没有成功：

data = data.set_index('Date')
data.resample('M')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-435afe449f1f> in <module>
     47 data = pd.DataFrame(dataList, columns=('Date', 'Value'))
     48 data = data.set_index('Date')
---> 49 data.resample('M')

Answer 1

您可以尝试这样的操作，它甚至不需要更改索引：

data_month = data.resample('M', on='Date').mean()

请注意，重新采样本身并不能自行解决问题。 .mean() 是必需的。

更多关于 documentation :)

Answer 2

我们可以将您的日期时间列转换为每月频率的 PeriodIndex，然后使用 GroupBy.mean 取平均值：

df.groupby(pd.PeriodIndex(df['Date'], freq="M"))['Value'].mean()
    
Date
2006-01    14.6
2019-12    38.2
Freq: M, Name: Value, dtype: float64

df.groupby(pd.PeriodIndex(df['Date'], freq="M"))['Value'].mean().reset_index()

      Date  Value
0  2006-01   14.6
1  2019-12   38.2

此方法的一个注意事项是不会显示缺失的月份。如果这很重要，请以相同的方式使用 set_index 和 resample.mean。

获取熊猫的月平均值

2 个答案: