Question

考虑每小时的时间序列，例如：

import numpy
import pandas


data = numpy.random.random(365 * 24)
index = pandas.date_range('2018-01-01', '2019-01-01', freq='H', closed='left')
series = pandas.Series(data, index=index)

看起来像这样：

2018-01-01 00:00:00    0.823988
2018-01-01 01:00:00    0.169911
2018-01-01 02:00:00    0.359008
2018-01-01 03:00:00    0.873489
                         ...   
2018-12-31 20:00:00    0.898772
2018-12-31 21:00:00    0.635318
2018-12-31 22:00:00    0.061060
2018-12-31 23:00:00    0.972468
Freq: H, Length: 8760, dtype: float64

我想对该系列重新采样+求和，以每月一次的频率出现：

series.resample('M').sum()

但是将标签/时间戳设置在垃圾箱的左侧。所以代替：

2018-01-31    371.188835
2018-02-28    336.244967
2018-03-31    370.686715
2018-04-30    363.955540
2018-05-31    387.631062
2018-06-30    372.343839
2018-07-31    365.484547
2018-08-31    352.756428
2018-09-30    378.930171
2018-10-31    388.491260
2018-11-30    362.552504
2018-12-31    387.159189
Freq: M, dtype: float64

我想得到：

2018-01-01    371.188835
2018-02-01    336.244967
2018-03-01    370.686715
2018-04-01    363.955540
2018-05-01    387.631062
2018-06-01    372.343839
2018-07-01    365.484547
2018-08-01    352.756428
2018-09-01    378.930171
2018-10-01    388.491260
2018-11-01    362.552504
2018-12-01    387.159189
Freq: M, dtype: float64

我尝试过：

series.resample('M', closed='left').sum()
series.resample('M', closed='right').sum()
series.resample('M', label='left').sum()
series.resample('M', label='right').sum()
series.resample('M', closed='left', label='left').sum()
series.resample('M', closed='left', label='right').sum()
series.resample('M', closed='right', label='left').sum()
series.resample('M', closed='right', label='right').sum()

没有成功。

我知道我可以做到：

series = series.resample('M', label='left').sum()
series.index += pandas.DateOffset(1, 'D')

但是我感觉应该有一种更好的方法。

Answer 1

确实有更好的方法。您可以使用'MS'规则进行重新采样：

>>> series.resample('MS').sum()
2018-01-01    371.188835
2018-02-01    336.244967
2018-03-01    370.686715
2018-04-01    363.955540
2018-05-01    387.631062
2018-06-01    372.343839
2018-07-01    365.484547
2018-08-01    352.756428
2018-09-01    378.930171
2018-10-01    388.491260
2018-11-01    362.552504
2018-12-01    387.159189
Freq: MS, dtype: float64

请参见list of DateOffset objects and their associated frequencies strings。

Answer 2

如果日子不是那么重要，那么使用PeriodIndex可能会很有用。（不同的种子，所以数字看起来不同）

res = series.groupby(pd.PeriodIndex(series.index, freq='M')).sum()
print(res)

2018-01    376.144859
2018-02    353.536371
2018-03    365.711851
2018-04    364.050189
2018-05    371.040633
2018-06    360.810081
2018-07    378.734175
2018-08    360.652323
2018-09    360.645801
2018-10    360.035224
2018-11    356.731138
2018-12    369.220704
Freq: M, dtype: float64

默认情况下，它们可以使用.to_timestamp转换为每月的第一天

res.index = res.index.to_timestamp()
print(res)

2018-01-01    376.144859
2018-02-01    353.536371
2018-03-01    365.711851
2018-04-01    364.050189
2018-05-01    371.040633
2018-06-01    360.810081
2018-07-01    378.734175
2018-08-01    360.652323
2018-09-01    360.645801
2018-10-01    360.035224
2018-11-01    356.731138
2018-12-01    369.220704
Freq: MS, dtype: float64

按月频率重新采样，向左标签（月初）

2 个答案: