我有30分钟的时间间隔的OHLC数据帧。
2017-04-30 11:00:00-04:00 239.06 239.39 239.04 239.33 28
2017-04-30 11:30:00-04:00 239.01 239.22 238.91 239.03 28
2017-04-30 12:00:00-04:00 239.02 239.28 238.99 239.03 29
2017-04-30 12:30:00-04:00 238.94 239.08 238.84 239.03 28
2017-04-30 13:00:00-04:00 239.01 239.11 238.93 238.94 27
2017-04-30 13:30:00-04:00 238.94 239.08 238.86 239.03 12
我想在小时栏中对数据进行重新取样,但有没有办法将每小时栏定义为每30分钟结束一次,例如9:30-10:30
vs 9:00-10:00
?
答案 0 :(得分:2)
要重新采样到采样周期的偏移量,请使用base
参数(resample)
base:int,默认为0
对于均匀细分1天的频率,聚合间隔的“原点”。例如,对于“5分钟”频率,基数可以在0到4之间。默认值为0
<强>代码:强>
df = df.resample('1H', base=0.5).last()
测试代码:
df = pd.read_fwf(StringIO(u"""
Date O H L C
2017-04-30T11:00:00-0400 239.06 239.39 239.04 239.33
2017-04-30T11:30:00-0400 239.01 239.22 238.91 239.03
2017-04-30T12:00:00-0400 239.02 239.28 238.99 239.03
2017-04-30T12:30:00-0400 238.94 239.08 238.84 239.03
2017-04-30T13:00:00-0400 239.01 239.11 238.93 238.94
2017-04-30T13:30:00-0400 238.94 239.08 238.86 239.03"""
), header=1)
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
df = df.resample('1H', base=0.5).last()
print(df)
<强>结果:强>
O H L C
Date
2017-04-30 14:30:00 239.06 239.39 239.04 239.33
2017-04-30 15:30:00 239.02 239.28 238.99 239.03
2017-04-30 16:30:00 239.01 239.11 238.93 238.94
2017-04-30 17:30:00 238.94 239.08 238.86 239.03
答案 1 :(得分:0)
import pandas as pd
df = {'A': {'2017-04-30 15:00:00': 239.06,
'2017-04-30 15:30:00': 239.00999999999999,
'2017-04-30 16:00:00': 239.02000000000001,
'2017-04-30 16:30:00': 238.94,
'2017-04-30 17:00:00': 239.00999999999999,
'2017-04-30 17:30:00': 238.94},
'B': {'2017-04-30 15:00:00': 239.38999999999999,
'2017-04-30 15:30:00': 239.22,
'2017-04-30 16:00:00': 239.28,
'2017-04-30 16:30:00': 239.08000000000001,
'2017-04-30 17:00:00': 239.11000000000001,
'2017-04-30 17:30:00': 239.08000000000001},
'C': {'2017-04-30 15:00:00': 239.03999999999999,
'2017-04-30 15:30:00': 238.91,
'2017-04-30 16:00:00': 238.99000000000001,
'2017-04-30 16:30:00': 238.84,
'2017-04-30 17:00:00': 238.93000000000001,
'2017-04-30 17:30:00': 238.86000000000001},
'D': {'2017-04-30 15:00:00': 239.33000000000001,
'2017-04-30 15:30:00': 239.03,
'2017-04-30 16:00:00': 239.03,
'2017-04-30 16:30:00': 239.03,
'2017-04-30 17:00:00': 238.94,
'2017-04-30 17:30:00': 239.03},
'E': {'2017-04-30 15:00:00': 28,
'2017-04-30 15:30:00': 28,
'2017-04-30 16:00:00': 29,
'2017-04-30 16:30:00': 28,
'2017-04-30 17:00:00': 27,
'2017-04-30 17:30:00': 12}}
df.index = pd.to_datetime(df.index)
A B C D E
2017-04-30 15:00:00 239.06 239.39 239.04 239.33 28
2017-04-30 15:30:00 239.01 239.22 238.91 239.03 28
2017-04-30 16:00:00 239.02 239.28 238.99 239.03 29
2017-04-30 16:30:00 238.94 239.08 238.84 239.03 28
2017-04-30 17:00:00 239.01 239.11 238.93 238.94 27
2017-04-30 17:30:00 238.94 239.08 238.86 239.03 12
#if your data is stricly half-hourly, you can get the hourly data ending every 30 mins as below:
df.resample('1H').last()
A B C D E
2017-04-30 15:00:00 239.01 239.22 238.91 239.03 28
2017-04-30 16:00:00 238.94 239.08 238.84 239.03 28
2017-04-30 17:00:00 238.94 239.08 238.86 239.03 12
答案 2 :(得分:0)
df = pd.DataFrame([['2017-04-30 11:00:00-04:00', '239.06', '239.39', '239.04', '239.33', '28'],
['2017-04-30 11:30:00-04:00', '239.01', '239.22', '238.91', '239.03', '28'],
['2017-04-30 12:00:00-04:00', '239.02', '239.28', '238.99', '239.03', '29'],
['2017-04-30 12:30:00-04:00', '238.94', '239.08', '238.84', '239.03', '28'],
['2017-04-30 13:00:00-04:00', '239.01', '239.11', '238.93', '238.94', '27'],
['2017-04-30 13:30:00-04:00', '238.94', '239.08', '238.86', '239.03', '12']],
columns=['Time', 'O', 'H', 'L', 'C', 'V'])
df.Time = pd.to_datetime(df.Time)
df.loc[df.Time.dt.minute==30] # choose minutes equal to 30
df.loc[df.Time.dt.minute==0] # choose minutes equal to 0
或
df.set_index('Time', inplace=True)
df.resample('1H', base=0).last() # base=0 means start from 0H
df.resample('1H', base=0.5).last() # base=0.5 means start from 0.5H (30 mins)