初始数据:
df.head()
df.tail()
输出中:
value
ts
2017-09-20 21:00:45.514847+00:00 -60.0
2017-09-20 21:01:29.169977+00:00 -60.0
2017-09-20 21:02:13.694557+00:00 -60.0
2017-09-20 21:02:57.954950+00:00 -60.0
2017-09-20 21:03:40.615305+00:00 -60.0
value
...
ts
2017-09-21 20:56:27.126042+00:00 -60.0
2017-09-21 20:57:11.993958+00:00 -60.0
2017-09-21 20:57:55.010927+00:00 -60.0
2017-09-21 20:58:40.413179+00:00 -60.0
2017-09-21 20:59:25.451698+00:00 -60.0
如您所见,数据为时间戳(+03:00),值为1天= 24H
以不同时间段重新取样数据的结果:
resample_params = [u'1H', u'2H', u'4H', u'6H', u'8H', u'12H',]
让我们这样做:
for resample_rule in resample_params:
r = df1.resample(resample_rule, closed='right', label='left', base=1)
# mean-median-count
result = r.agg(['mean', 'median', 'count', 'std', 'sem', 'mad',])
result.fillna(0, inplace=True)
print result.value['count'], '\nlen =', len(result.value['count']), 'sum =', sum(result.value['count'])
输出:
ts
2017-09-20 21:00:00+00:00 82
2017-09-20 22:00:00+00:00 82
2017-09-20 23:00:00+00:00 83
2017-09-21 00:00:00+00:00 83
2017-09-21 01:00:00+00:00 83
Freq: H, Name: count, dtype: int64
len = 24 sum = 1977
ts
2017-09-20 21:00:00+00:00 164
2017-09-20 23:00:00+00:00 166
2017-09-21 01:00:00+00:00 166
2017-09-21 03:00:00+00:00 166
2017-09-21 05:00:00+00:00 165
Freq: 2H, Name: count, dtype: int64
len = 12 sum = 1977
ts
2017-09-20 21:00:00+00:00 330
2017-09-21 01:00:00+00:00 332
2017-09-21 05:00:00+00:00 328
2017-09-21 09:00:00+00:00 330
2017-09-21 13:00:00+00:00 329
Freq: 4H, Name: count, dtype: int64
len = 6 sum = 1977
ts
2017-09-20 19:00:00+00:00 330
2017-09-21 01:00:00+00:00 497
2017-09-21 07:00:00+00:00 493
2017-09-21 13:00:00+00:00 493
2017-09-21 19:00:00+00:00 164
Freq: 6H, Name: count, dtype: int64
len = 5 sum = 1977
ts
2017-09-20 17:00:00+00:00 330
2017-09-21 01:00:00+00:00 660
2017-09-21 09:00:00+00:00 659
2017-09-21 17:00:00+00:00 328
Freq: 8H, Name: count, dtype: int64
len = 4 sum = 1977
ts
2017-09-20 13:00:00+00:00 330
2017-09-21 01:00:00+00:00 990
2017-09-21 13:00:00+00:00 657
Freq: 12H, Name: count, dtype: int64
len = 3 sum = 1977
句点= u'1H', u'2H', u'4H'
正常
但u'6H', u'8H', u'12H'
给出len + 1并更改时间戳(查看每个df的第一行)
我尝试了不同的基础,已关闭和标签参数以及resample rules
如何获得超过4H的正确重新采样?
答案 0 :(得分:1)
在进一步研究resample方法之后,我得到 base is int 可能是负面的! base = -3 非常适合我!
输出:
ts
2017-09-20 21:00:00+00:00 82
2017-09-20 22:00:00+00:00 82
2017-09-20 23:00:00+00:00 83
2017-09-21 00:00:00+00:00 83
2017-09-21 01:00:00+00:00 83
Freq: H, Name: count, dtype: int64
len = 24 sum = 1977
ts
2017-09-20 21:00:00+00:00 164
2017-09-20 23:00:00+00:00 166
2017-09-21 01:00:00+00:00 166
2017-09-21 03:00:00+00:00 166
2017-09-21 05:00:00+00:00 165
Freq: 2H, Name: count, dtype: int64
len = 12 sum = 1977
ts
2017-09-20 21:00:00+00:00 330
2017-09-21 01:00:00+00:00 332
2017-09-21 05:00:00+00:00 328
2017-09-21 09:00:00+00:00 330
2017-09-21 13:00:00+00:00 329
Freq: 4H, Name: count, dtype: int64
len = 6 sum = 1977
ts
2017-09-20 21:00:00+00:00 496
2017-09-21 03:00:00+00:00 494
2017-09-21 09:00:00+00:00 495
2017-09-21 15:00:00+00:00 492
Freq: 6H, Name: count, dtype: int64
len = 4 sum = 1977
ts
2017-09-20 21:00:00+00:00 662
2017-09-21 05:00:00+00:00 658
2017-09-21 13:00:00+00:00 657
Freq: 8H, Name: count, dtype: int64
len = 3 sum = 1977
ts
2017-09-20 21:00:00+00:00 990
2017-09-21 09:00:00+00:00 987
Freq: 12H, Name: count, dtype: int64
len = 2 sum = 1977
ts
2017-09-20 21:00:00+00:00 1977
Freq: 24H, Name: count, dtype: int64
len = 1 sum = 1977