如何使用pandas转换时间序列数据并通过highstock显示图形

时间:2013-09-03 13:20:08

标签: python pandas highstock

我有一些时间序列数据(金融股票交易数据):

TIMESTAMP    PRICE     VOLUME
1294311545    24990  1500000000
1294317813    25499  5000000000
1294318449    25499   100000000

我需要根据价格列将它们转换为OHLC值(JSON列表),即(open,high,low,close),并将其显示为具有highstock JS框架的OHLC图。 输出应如下:

[{'time':'2013-09-01','open':24999,'high':25499,'low':24999,'close':25000,'volume':15000000},
 {'time':'2013-09-02','open':24900,'high':25600,'low':24800,'close':25010,'volume':16000000},
 {...}]

例如,我的示例包含每天2013-09-01的10个数据,输出将包含当天的一个对象,其中high是所有10个数据的最高价格,low是最低价格,open是当天的第一个价格,close是当天的最后价格,volume应该是所有10个数据的总价。

我知道有一个python库pandas可能会这样做,但我仍然无法尝试。

更新:根据建议,我使用resample()作为:

df['VOLUME'].resample('H', how='sum')
df['PRICE'].resample('H', how='ohlc')

但如何合并结果?

1 个答案:

答案 0 :(得分:0)

目前你只能在列/系列上执行ohlc(在0.13中为fixed)。

首先,将TIMESTAMP列强制转换为pandas Timestamp:

In [11]: df.TIMESTAMP = pd.to_datetime(df.TIMESTAMP, unit='s')

In [12]: df.set_index('TIMESTAMP', inplace=True)

In [13]: df
Out[13]:
                     PRICE      VOLUME
TIMESTAMP
2011-01-06 10:59:05  24990  1500000000
2011-01-06 12:43:33  25499  5000000000
2011-01-06 12:54:09  25499   100000000

通过ohlc重新采样(这里我按小时重新采样):

In [14]: df['VOLUME'].resample('H', how='ohlc')
Out[14]:
                           open        high         low       close
TIMESTAMP
2011-01-06 10:00:00  1500000000  1500000000  1500000000  1500000000
2011-01-06 11:00:00         NaN         NaN         NaN         NaN
2011-01-06 12:00:00  5000000000  5000000000   100000000   100000000

In [15]: df['PRICE'].resample('H', how='ohlc')
Out[15]:
                      open   high    low  close
TIMESTAMP
2011-01-06 10:00:00  24990  24990  24990  24990
2011-01-06 11:00:00    NaN    NaN    NaN    NaN
2011-01-06 12:00:00  25499  25499  25499  25499

您可以将to_json应用于任何DataFrame:

In [16]: df['PRICE'].resample('H', how='ohlc').to_json()
Out[16]: '{"open":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0},"high":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0},"low":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0},"close":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0}}'

*这可能是DataFrame atm NotImplemented的直接增强。

更新:从您想要的输出(或至少非常接近),可以实现如下:

In [21]: price = df['PRICE'].resample('D', how='ohlc').reset_index()

In [22]: price
Out[22]: 
            TIMESTAMP   open   high    low  close
0 2011-01-06 00:00:00  24990  25499  24990  25499

使用记录方向和iso date_format:

In [23]: price.to_json(date_format='iso', orient='records')
Out[23]: '[{"TIMESTAMP":"2011-01-06T00:00:00.000Z","open":24990,"high":25499,"low":24990,"close":25499}]'

In [24]: price.to_json('foo.json', date_format='iso', orient='records')  # save as json file