我有一个来自所有当前实时交易的加密货币交换的websocket feed(勾选数据)。我想存储(只需要最后5-10分钟)并实时处理这些数据(每隔5秒),然后将其传递给其他函数。
每个刻度如下所示:
[{'homeNotional': 0.00546742, 'foreignNotional': 49, 'trdMatchID': '61c7a96f-28f3-0df1-4223-516d695333ee', 'tickDirection': 'ZeroMinusTick', 'price': 8962, 'timestamp': '2018-03-20T22:15:27.437Z', 'side': 'Sell', 'grossValue': 546742, 'size': 49, 'symbol': 'XBTUSD'}]
然后可以产生类似下面的内容:
foreignNotional grossValue homeNotional price side size symbol tickDirection timestamp trdMatchID
0 2 22316 0.000223 8962.5 Buy 2 XBTUSD PlusTick 2018-03-20T22:15:01.614Z a06f6302-34b8-d307-e8e6-e4e617cbedf7
0 20 223160 0.002232 8962.0 Sell 20 XBTUSD MinusTick 2018-03-20T22:15:01.753Z bb68d1f9-60ff-2990-c32a-bdbd14f5d773
0 22 245476 0.002455 8962.5 Buy 22 XBTUSD PlusTick 2018-03-20T22:15:01.858Z a8e69797-5940-4aa1-2ab1-dd1ccceaa181
1 65 725270 0.007253 8962.5 Buy 65 XBTUSD ZeroPlusTick2018-03-20T22:15:01.858Z 685851cb-fd50-583d-5895-c91bbed10c98
我目前的想法是:
1. Append all tick data to df1 in real time
2. Take last 5 minutes of data from df1 and save it to df2
3. Run pre-processing on df2 including converting to OHLCV values and save in df3
4. Clear df2
5. Clear old cache of df1 every 10 minutes as I can't store it all in memory
6. Run steps 2,3&4 every 5 seconds.
我的问题是:
这种方法是否可行,熊猫最适合这项任务?
我目前正在使用Jupyter笔记本,如果我运行第1步并将数据附加到df1,我就无法运行并发任务,例如切片并保存到df2。最好的方法是什么?