我有一些看起来像这样的财务数据 时间戳,OHCL。现在我想将我的熊猫数据帧聚合成1分钟的条形图。在熊猫中有一种优雅的方式吗?
答案 0 :(得分:2)
正如@JohnE所说,resample
是你需要的工具。您可以将how='ohlc'
传递给resample
以获得所需的输出。
import pandas as pd
import numpy as np
# generate some artificial data
# ===========================================
np.random.seed(0)
dt_rng = pd.date_range(start='2015-09-02 09:30:00', end='2015-09-02 15:59:59', freq='s')
df = pd.DataFrame(100+np.random.randn(len(dt_rng)).cumsum(), columns=['px'], index=dt_rng)
print(df)
px
2015-09-02 09:30:00 101.7641
2015-09-02 09:30:01 102.1642
2015-09-02 09:30:02 103.1429
2015-09-02 09:30:03 105.3838
2015-09-02 09:30:04 107.2514
2015-09-02 09:30:05 106.2741
2015-09-02 09:30:06 107.2242
2015-09-02 09:30:07 107.0729
... ...
2015-09-02 15:59:52 79.0222
2015-09-02 15:59:53 81.2040
2015-09-02 15:59:54 81.6277
2015-09-02 15:59:55 82.3117
2015-09-02 15:59:56 83.0102
2015-09-02 15:59:57 82.7588
2015-09-02 15:59:58 81.0294
2015-09-02 15:59:59 81.3962
[23400 rows x 1 columns]
# processing
# =======================
df.resample('1min', how='ohlc')
px
open high low close
2015-09-02 09:30:00 101.7641 113.8188 101.7641 104.6000
2015-09-02 09:31:00 103.9276 115.9134 96.2217 115.9134
2015-09-02 09:32:00 116.2898 120.5850 115.1904 116.7901
2015-09-02 09:33:00 116.4361 116.5853 108.7353 111.4434
2015-09-02 09:34:00 110.8060 110.8060 99.6007 108.2589
2015-09-02 09:35:00 106.9523 108.6105 92.8644 93.4848
2015-09-02 09:36:00 94.1833 95.6041 84.2610 91.4362
2015-09-02 09:37:00 92.3657 92.9479 80.2402 85.0347
... ... ... ... ...
2015-09-02 15:52:00 64.6560 69.4697 56.4659 69.1167
2015-09-02 15:53:00 69.3775 73.6731 64.6894 73.6731
2015-09-02 15:54:00 74.6119 81.2891 67.9659 78.4973
2015-09-02 15:55:00 78.9224 81.8589 72.9847 77.1010
2015-09-02 15:56:00 77.7440 91.1469 77.7440 88.8073
2015-09-02 15:57:00 88.9114 90.8509 83.8462 87.7416
2015-09-02 15:58:00 88.2430 89.0107 80.5122 87.0581
2015-09-02 15:59:00 87.1443 87.1443 77.6822 81.3962
[390 rows x 4 columns]
答案 1 :(得分:1)
您可能需要进行一些处理,但pd.cut()可以执行此操作。
>>> seconds = [10.5,12.5,22.5,33.5,15.02, 19.26, 35.26]
>>> bins = [10,11,12,13,14,15,20,25,30,40]
>>> cats = pd.cut(seconds, bins)
>>> cats
[(10, 11], (12, 13], (20, 25], (30, 40], (15, 20], (15, 20], (30, 40]]
一旦你有了这个,你可以通过这个列进行聚合但是适合你的分析。