我正尝试从15分钟到每周重新采样时间序列数据。但这无法解决问题,我阅读了文档和许多相关问题,但不理解。
我的代码如下
Date Actual Forecast Demand
0 01/01/2017 00:00 1049 1011.0 2922
1 01/01/2017 00:15 961 1029.0 2892
2 01/01/2017 00:30 924 1048.0 2858
3 01/01/2017 00:45 852 1066.0 2745
原始数据如下
Date
2017-01-01 01/01/2017 00:0001/01/2017 00:1501/01/2017 00:...
2017-01-08 01/02/2017 00:0001/02/2017 00:1501/02/2017 00:...
2017-01-15 01/09/2017 00:0001/09/2017 00:1501/09/2017 00:...
2017-01-22 16/01/2017 00:0016/01/2017 00:1516/01/2017 00:...
重新采样后,数据变成这样
{{1}}
我只想每周分别汇总“实际”,“预测”和“需求”,您知道我做错了吗?
答案 0 :(得分:2)
您要在仅包含resample
变量作为字符串的pd.Series
上调用Date
,因此pandas通过在每一行中将它们连接在一起来总结这些字符串。更改此:
Wind_Weekly = Wind['Date'].resample('W').sum()
对此:
Wind_Weekly = Wind.resample('W').sum()
# Next also works, and removes Date column from the resulting sum
Wind_Weekly = Wind.resample('W')['Actual', 'Forecast', 'Demand'].sum()
调用Wind['Date']
将返回一个pd.Series,它仅包含转换为datetime
之前的日期。因此,实际上没有Actual
,Forecast
或Demand
变量传递给resample
调用。
您可以检查:
>>> type(Wind['Date'])
<class 'pandas.core.series.Series'>
为了进行测试,我用以下代码重现了您的问题:
import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2012', periods=100, freq='D')
df = pd.DataFrame( # Construct df with a datetime index and some numbers
{'ones': np.ones(100), 'twos': np.full(100, 2), 'zeros': np.zeros(100)},
index=rng
)
df['Date'] = rng.astype(str) # re-add index as a str
在口译员中:
>>> df.resample('W').sum() # works out of the box
ones twos zeros
2012-01-01 1.0 2 0.0
2012-01-08 7.0 14 0.0
2012-01-15 7.0 14 0.0
...
>>> df['Date'].resample('W').sum() # same result as you, only resample 'Date' column
2012-01-01 2012-01-01
2012-01-08 2012-01-022012-01-032012-01-042012-01-052012-0...
2012-01-15 2012-01-092012-01-102012-01-112012-01-122012-0...
2012-01-22 2012-01-162012-01-172012-01-182012-01-192012-0...
...