pandas用用户定义的时间间隔重新采样

时间:2017-04-30 18:04:15

标签: python pandas dataframe resampling

我有30分钟的时间间隔的OHLC数据帧。

2017-04-30 11:00:00-04:00  239.06  239.39  239.04  239.33      28
2017-04-30 11:30:00-04:00  239.01  239.22  238.91  239.03      28
2017-04-30 12:00:00-04:00  239.02  239.28  238.99  239.03      29
2017-04-30 12:30:00-04:00  238.94  239.08  238.84  239.03      28
2017-04-30 13:00:00-04:00  239.01  239.11  238.93  238.94      27
2017-04-30 13:30:00-04:00  238.94  239.08  238.86  239.03      12

我想在小时栏中对数据进行重新取样,但有没有办法将每小时栏定义为每30分钟结束一次,例如9:30-10:30 vs 9:00-10:00

3 个答案:

答案 0 :(得分:2)

要重新采样到采样周期的偏移量,请使用base参数(resample

  

base:int,默认为0

     

对于均匀细分1天的频率,聚合间隔的“原点”。例如,对于“5分钟”频率,基数可以在0到4之间。默认值为0

<强>代码:

df = df.resample('1H', base=0.5).last()

测试代码:

df = pd.read_fwf(StringIO(u"""
    Date                      O       H       L       C
    2017-04-30T11:00:00-0400  239.06  239.39  239.04  239.33
    2017-04-30T11:30:00-0400  239.01  239.22  238.91  239.03
    2017-04-30T12:00:00-0400  239.02  239.28  238.99  239.03
    2017-04-30T12:30:00-0400  238.94  239.08  238.84  239.03
    2017-04-30T13:00:00-0400  239.01  239.11  238.93  238.94
    2017-04-30T13:30:00-0400  238.94  239.08  238.86  239.03"""
), header=1)
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')

df = df.resample('1H', base=0.5).last()
print(df)

<强>结果:

                          O       H       L       C
Date                                               
2017-04-30 14:30:00  239.06  239.39  239.04  239.33
2017-04-30 15:30:00  239.02  239.28  238.99  239.03
2017-04-30 16:30:00  239.01  239.11  238.93  238.94
2017-04-30 17:30:00  238.94  239.08  238.86  239.03

答案 1 :(得分:0)

import pandas as pd
df = {'A': {'2017-04-30 15:00:00': 239.06,
  '2017-04-30 15:30:00': 239.00999999999999,
  '2017-04-30 16:00:00': 239.02000000000001,
  '2017-04-30 16:30:00': 238.94,
  '2017-04-30 17:00:00': 239.00999999999999,
  '2017-04-30 17:30:00': 238.94},
 'B': {'2017-04-30 15:00:00': 239.38999999999999,
  '2017-04-30 15:30:00': 239.22,
  '2017-04-30 16:00:00': 239.28,
  '2017-04-30 16:30:00': 239.08000000000001,
  '2017-04-30 17:00:00': 239.11000000000001,
  '2017-04-30 17:30:00': 239.08000000000001},
 'C': {'2017-04-30 15:00:00': 239.03999999999999,
  '2017-04-30 15:30:00': 238.91,
  '2017-04-30 16:00:00': 238.99000000000001,
  '2017-04-30 16:30:00': 238.84,
  '2017-04-30 17:00:00': 238.93000000000001,
  '2017-04-30 17:30:00': 238.86000000000001},
 'D': {'2017-04-30 15:00:00': 239.33000000000001,
  '2017-04-30 15:30:00': 239.03,
  '2017-04-30 16:00:00': 239.03,
  '2017-04-30 16:30:00': 239.03,
  '2017-04-30 17:00:00': 238.94,
  '2017-04-30 17:30:00': 239.03},
 'E': {'2017-04-30 15:00:00': 28,
  '2017-04-30 15:30:00': 28,
  '2017-04-30 16:00:00': 29,
  '2017-04-30 16:30:00': 28,
  '2017-04-30 17:00:00': 27,
  '2017-04-30 17:30:00': 12}}

df.index = pd.to_datetime(df.index)
                          A       B       C       D   E
2017-04-30 15:00:00  239.06  239.39  239.04  239.33  28
2017-04-30 15:30:00  239.01  239.22  238.91  239.03  28
2017-04-30 16:00:00  239.02  239.28  238.99  239.03  29
2017-04-30 16:30:00  238.94  239.08  238.84  239.03  28
2017-04-30 17:00:00  239.01  239.11  238.93  238.94  27
2017-04-30 17:30:00  238.94  239.08  238.86  239.03  12

#if your data is stricly half-hourly, you can get the hourly data ending every 30 mins as below:
df.resample('1H').last()
                          A       B       C       D   E
2017-04-30 15:00:00  239.01  239.22  238.91  239.03  28
2017-04-30 16:00:00  238.94  239.08  238.84  239.03  28
2017-04-30 17:00:00  238.94  239.08  238.86  239.03  12

答案 2 :(得分:0)

df = pd.DataFrame([['2017-04-30 11:00:00-04:00', '239.06', '239.39', '239.04', '239.33', '28'],
                   ['2017-04-30 11:30:00-04:00', '239.01', '239.22', '238.91', '239.03', '28'],
                   ['2017-04-30 12:00:00-04:00', '239.02', '239.28', '238.99', '239.03', '29'],
                   ['2017-04-30 12:30:00-04:00', '238.94', '239.08', '238.84', '239.03', '28'],
                   ['2017-04-30 13:00:00-04:00', '239.01', '239.11', '238.93', '238.94', '27'],
                   ['2017-04-30 13:30:00-04:00', '238.94', '239.08', '238.86', '239.03', '12']], 
                  columns=['Time', 'O', 'H', 'L', 'C', 'V'])

df.Time = pd.to_datetime(df.Time)

df.loc[df.Time.dt.minute==30] # choose minutes equal to 30
df.loc[df.Time.dt.minute==0] # choose minutes equal to 0

df.set_index('Time', inplace=True)

df.resample('1H', base=0).last() # base=0 means start from 0H
df.resample('1H', base=0.5).last() # base=0.5 means start from 0.5H (30 mins)