我有一些时间序列数据(金融股票交易数据):
TIMESTAMP PRICE VOLUME
1294311545 24990 1500000000
1294317813 25499 5000000000
1294318449 25499 100000000
我需要根据价格列将它们转换为OHLC值(JSON列表),即(open,high,low,close),并将其显示为具有highstock JS框架的OHLC图。 输出应如下:
[{'time':'2013-09-01','open':24999,'high':25499,'low':24999,'close':25000,'volume':15000000},
{'time':'2013-09-02','open':24900,'high':25600,'low':24800,'close':25010,'volume':16000000},
{...}]
例如,我的示例包含每天2013-09-01
的10个数据,输出将包含当天的一个对象,其中high
是所有10个数据的最高价格,low
是最低价格,open
是当天的第一个价格,close
是当天的最后价格,volume
应该是所有10个数据的总价。
我知道有一个python库pandas可能会这样做,但我仍然无法尝试。
更新:根据建议,我使用resample()作为:
df['VOLUME'].resample('H', how='sum')
df['PRICE'].resample('H', how='ohlc')
但如何合并结果?
答案 0 :(得分:0)
目前你只能在列/系列上执行ohlc(在0.13中为fixed)。
首先,将TIMESTAMP列强制转换为pandas Timestamp:
In [11]: df.TIMESTAMP = pd.to_datetime(df.TIMESTAMP, unit='s')
In [12]: df.set_index('TIMESTAMP', inplace=True)
In [13]: df
Out[13]:
PRICE VOLUME
TIMESTAMP
2011-01-06 10:59:05 24990 1500000000
2011-01-06 12:43:33 25499 5000000000
2011-01-06 12:54:09 25499 100000000
通过ohlc重新采样(这里我按小时重新采样):
In [14]: df['VOLUME'].resample('H', how='ohlc')
Out[14]:
open high low close
TIMESTAMP
2011-01-06 10:00:00 1500000000 1500000000 1500000000 1500000000
2011-01-06 11:00:00 NaN NaN NaN NaN
2011-01-06 12:00:00 5000000000 5000000000 100000000 100000000
In [15]: df['PRICE'].resample('H', how='ohlc')
Out[15]:
open high low close
TIMESTAMP
2011-01-06 10:00:00 24990 24990 24990 24990
2011-01-06 11:00:00 NaN NaN NaN NaN
2011-01-06 12:00:00 25499 25499 25499 25499
您可以将to_json应用于任何DataFrame:
In [16]: df['PRICE'].resample('H', how='ohlc').to_json()
Out[16]: '{"open":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0},"high":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0},"low":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0},"close":{"1294308000000000000":24990.0,"1294311600000000000":null,"1294315200000000000":25499.0}}'
*这可能是DataFrame atm NotImplemented的直接增强。
更新:从您想要的输出(或至少非常接近),可以实现如下:
In [21]: price = df['PRICE'].resample('D', how='ohlc').reset_index()
In [22]: price
Out[22]:
TIMESTAMP open high low close
0 2011-01-06 00:00:00 24990 25499 24990 25499
使用记录方向和iso date_format:
In [23]: price.to_json(date_format='iso', orient='records')
Out[23]: '[{"TIMESTAMP":"2011-01-06T00:00:00.000Z","open":24990,"high":25499,"low":24990,"close":25499}]'
In [24]: price.to_json('foo.json', date_format='iso', orient='records') # save as json file