OHLC采样创建了错误的时间戳蜡烛

时间:2019-03-09 10:49:08

标签: python pandas

采样所需的数据来自SQLite。它已在此处提供:https://pastebin.com/LU7YApkX

代码:

import sqlite3
import pandas as pd

conn = sqlite3.connect('sqlite_database.db')
query = "SELECT * FROM XXXX WHERE timestamp BETWEEN  '2019-01-24 09:15:00' AND '2019-01-24 09:59:59'"

df = pd.read_sql_query(query, conn, index_col=[
    'timestamp'], parse_dates=['timestamp'])

candles = df['ltp'].resample('5min').ohlc().bfill()
print(candles)

输出良好 (Resample period = 3min)

$ python3 why_ohlc_failing.py
                       open    high     low   close
timestamp
2019-01-24 09:15:00  286.55  286.70  285.85  286.20
2019-01-24 09:18:00  286.10  286.30  285.50  285.90
2019-01-24 09:21:00  285.90  286.25  285.65  285.85
2019-01-24 09:24:00  285.80  286.90  285.75  286.65
2019-01-24 09:27:00  286.65  286.85  286.35  286.60
2019-01-24 09:30:00  286.70  286.70  286.20  286.25
2019-01-24 09:33:00  286.25  286.95  286.20  286.95
2019-01-24 09:36:00  287.00  287.50  286.95  287.40
2019-01-24 09:39:00  287.45  287.50  287.00  287.45
2019-01-24 09:42:00  287.35  287.50  287.00  287.50
2019-01-24 09:45:00  287.40  288.15  287.40  288.05
2019-01-24 09:48:00  288.40  288.45  288.30  288.35
2019-01-24 09:51:00  288.40  288.45  288.30  288.35
2019-01-24 09:54:00  288.40  288.45  288.30  288.35
2019-01-24 09:57:00  288.40  288.45  288.30  288.35

输出良好 (Resample period = 5min)

$ python3 why_ohlc_failing.py
                       open    high    low   close
timestamp
2019-01-24 09:15:00  286.55  286.70  285.5  285.65
2019-01-24 09:20:00  285.65  286.25  285.6  285.95
2019-01-24 09:25:00  285.95  286.90  285.9  286.60
2019-01-24 09:30:00  286.70  286.70  286.2  286.60
2019-01-24 09:35:00  286.70  287.50  286.6  287.15
2019-01-24 09:40:00  287.15  287.50  287.0  287.50
2019-01-24 09:45:00  287.40  288.15  287.4  288.05
2019-01-24 09:50:00  288.40  288.45  288.3  288.35
2019-01-24 09:55:00  288.40  288.45  288.3  288.35

输出不良 (Resample period = 10min)

$ python3 why_ohlc_failing.py
                       open    high    low   close
timestamp
2019-01-24 09:10:00  286.55  286.70  285.5  285.65
2019-01-24 09:20:00  285.65  286.90  285.6  286.60
2019-01-24 09:30:00  286.70  287.50  286.2  287.15
2019-01-24 09:40:00  287.15  288.15  287.0  288.05
2019-01-24 09:50:00  288.40  288.45  288.3  288.35

输出良好 (Resample period = 15min)

$ python3 why_ohlc_failing.py
                       open    high    low   close
timestamp
2019-01-24 09:15:00  286.55  286.90  285.5  286.60
2019-01-24 09:30:00  286.70  287.50  286.2  287.50
2019-01-24 09:45:00  287.40  288.45  287.4  288.35

输出不良 (Resample period = 20min)

$ python3 why_ohlc_failing.py
                       open    high    low   close
timestamp
2019-01-24 09:00:00  286.55  286.70  285.5  285.65
2019-01-24 09:20:00  285.65  287.50  285.6  287.15
2019-01-24 09:40:00  287.15  288.45  287.0  288.35

问题:

如果您查看上面10min20min的采样周期中所有 BAD 输出,则从2019-01-24 09:10:002019-01-24 09:00:00开始。 这是错误的,因为在2019-01-24 09:15:01之前我什至没有任何数据。 但是,对于3min5min15min的采样周期,相同的代码也可以正常工作。

您能帮我弄清楚这里出什么问题了吗?我的理解与采样周期无关,重新采样的数据应始终以2019-01-24 09:15:00开头,否则没有任何意义,因为在此之前没有可用的股票报价。

2 个答案:

答案 0 :(得分:1)

重新采样时,例如到10min为止,它会创建10分钟的间隔,而2019-01-24 09:10:00对应于2019-01-24 09:10:00 - 2019-01-24 09:19:59

df['ltp'].resample('10min').ohlc().bfill()

输出:

                       open    high    low   close
t                                                 
2019-01-24 09:10:00  286.55  286.70  285.5  285.65
2019-01-24 09:20:00  285.65  286.90  285.6  286.60
2019-01-24 09:30:00  286.70  287.50  286.2  287.15
2019-01-24 09:40:00  287.15  288.15  287.0  288.05
2019-01-24 09:50:00  288.40  288.45  288.3  288.35

与:

print(
    df.loc['2019-01-24 09:10:00':'2019-01-24 09:19:59', 'ltp'].iloc[0],
    df.loc['2019-01-24 09:10:00':'2019-01-24 09:19:59', 'ltp'].max(),
    df.loc['2019-01-24 09:10:00':'2019-01-24 09:19:59', 'ltp'].min(),
    df.loc['2019-01-24 09:10:00':'2019-01-24 09:19:59', 'ltp'].iloc[-1])

输出:

286.55 286.7 285.5 285.65

注意:如果您想以第一个值开始重新采样的数据:

tmin = df.index[0]
df.index = df.index - tmin
df = df.resample('10min').ohlc().bfill()
df.index = df.index + tmin
df

输出:

                        ltp                       
                       open    high    low   close
t                                                 
2019-01-24 09:15:01  286.55  286.70  285.5  285.95
2019-01-24 09:25:01  285.95  286.90  285.9  286.70
2019-01-24 09:35:01  286.65  287.50  286.6  287.50
2019-01-24 09:45:01  287.40  288.15  287.4  288.05
2019-01-24 09:55:01  288.40  288.45  288.3  288.35

答案 1 :(得分:0)

以下在所有间隔下均可正常工作:

false

我不得不添加 bool ,尽管我仍在尝试了解此处的情况。

我进一步发现,要在各种采样周期内获得理想的结果,我需要添加各种data = df['ltp'].resample('5min', base=15).ohlc().bfill() 值,如下所示:

base=15

对于baseresample('1min', base=15) resample('2min', base=15) resample('3min', base=15) resample('4min', base=15) resample('5min', base=15) resample('6min', base=15) resample('7min', base=16) resample('8min', base=19) resample('9min', base=15) resample('10min', base=15) resample('11min', base=16) resample('12min', base=15) resample('13min', base=22) resample('14min', base=23) resample('15min', base=15) resample('16min', base=27) resample('17min', base=28) resample('18min', base=33) resample('19min', base=42) resample('20min', base=15) 1min3min,不需要任何5min即可进行以下操作:

15min

仍然试图理解base

的意义