Pandas数据帧按时间戳聚合成x分钟箱

时间:2015-08-30 11:22:17

标签: python pandas aggregate

我有一些看起来像这样的财务数据 时间戳,OHCL。现在我想将我的熊猫数据帧聚合成1分钟的条形图。在熊猫中有一种优雅的方式吗?

2 个答案:

答案 0 :(得分:2)

正如@JohnE所说,resample是你需要的工具。您可以将how='ohlc'传递给resample以获得所需的输出。

import pandas as pd
import numpy as np

# generate some artificial data
# ===========================================
np.random.seed(0)
dt_rng = pd.date_range(start='2015-09-02 09:30:00', end='2015-09-02 15:59:59', freq='s')
df = pd.DataFrame(100+np.random.randn(len(dt_rng)).cumsum(), columns=['px'], index=dt_rng)
print(df)

                           px
2015-09-02 09:30:00  101.7641
2015-09-02 09:30:01  102.1642
2015-09-02 09:30:02  103.1429
2015-09-02 09:30:03  105.3838
2015-09-02 09:30:04  107.2514
2015-09-02 09:30:05  106.2741
2015-09-02 09:30:06  107.2242
2015-09-02 09:30:07  107.0729
...                       ...
2015-09-02 15:59:52   79.0222
2015-09-02 15:59:53   81.2040
2015-09-02 15:59:54   81.6277
2015-09-02 15:59:55   82.3117
2015-09-02 15:59:56   83.0102
2015-09-02 15:59:57   82.7588
2015-09-02 15:59:58   81.0294
2015-09-02 15:59:59   81.3962

[23400 rows x 1 columns]

# processing
# =======================
df.resample('1min', how='ohlc')

                           px                              
                         open      high       low     close
2015-09-02 09:30:00  101.7641  113.8188  101.7641  104.6000
2015-09-02 09:31:00  103.9276  115.9134   96.2217  115.9134
2015-09-02 09:32:00  116.2898  120.5850  115.1904  116.7901
2015-09-02 09:33:00  116.4361  116.5853  108.7353  111.4434
2015-09-02 09:34:00  110.8060  110.8060   99.6007  108.2589
2015-09-02 09:35:00  106.9523  108.6105   92.8644   93.4848
2015-09-02 09:36:00   94.1833   95.6041   84.2610   91.4362
2015-09-02 09:37:00   92.3657   92.9479   80.2402   85.0347
...                       ...       ...       ...       ...
2015-09-02 15:52:00   64.6560   69.4697   56.4659   69.1167
2015-09-02 15:53:00   69.3775   73.6731   64.6894   73.6731
2015-09-02 15:54:00   74.6119   81.2891   67.9659   78.4973
2015-09-02 15:55:00   78.9224   81.8589   72.9847   77.1010
2015-09-02 15:56:00   77.7440   91.1469   77.7440   88.8073
2015-09-02 15:57:00   88.9114   90.8509   83.8462   87.7416
2015-09-02 15:58:00   88.2430   89.0107   80.5122   87.0581
2015-09-02 15:59:00   87.1443   87.1443   77.6822   81.3962

[390 rows x 4 columns]

答案 1 :(得分:1)

您可能需要进行一些处理,但pd.cut()可以执行此操作。

>>> seconds = [10.5,12.5,22.5,33.5,15.02, 19.26, 35.26]
>>> bins = [10,11,12,13,14,15,20,25,30,40]
>>> cats = pd.cut(seconds, bins)
>>> cats
[(10, 11], (12, 13], (20, 25], (30, 40], (15, 20], (15, 20], (30, 40]]

一旦你有了这个,你可以通过这个列进行聚合但是适合你的分析。