使用日期索引的熊猫数据框窗口

时间:2018-11-22 14:36:50

标签: python pandas

我在熊猫里有这些数据

data.tail(15)
                       open    high     low   close        vwap
date                                                           
2018-11-20 18:45:00  176.73  176.95  176.54  176.89  176.582983
2018-11-20 18:46:00  176.89  177.02  176.81  176.81  176.603020
2018-11-20 18:47:00  176.80  176.80  176.43  176.43  176.612706
2018-11-20 18:48:00  176.45  176.46  176.21  176.21  176.599967
2018-11-20 18:49:00  176.22  176.32  176.14  176.26  176.586624
2018-11-20 18:50:00  176.26  176.38  176.23  176.28  176.577114
2018-11-20 18:51:00  176.31  176.43  176.20  176.20  176.562641
2018-11-20 18:52:00  176.22  176.25  176.15  176.18  176.544664
2018-11-20 18:53:00  176.19  176.19  175.97  176.00  176.506937
2018-11-20 18:54:00  176.00  176.30  175.97  176.30  176.493768
2018-11-20 18:55:00  176.29  176.92  176.11  176.91  176.518353
2018-11-20 18:56:00  176.92  177.03  176.67  176.76  176.554964
2018-11-20 18:57:00  176.78  176.89  176.74  176.76  176.566201
2018-11-20 18:58:00  176.77  176.87  176.56  176.65  176.571326
2018-11-20 18:59:00  176.65  177.17  176.59  176.94  176.681413

我需要将子数据帧按5分组,例如:

1: 
2018-11-20 18:45:00  176.73  176.95  176.54  176.89  176.582983
2018-11-20 18:46:00  176.89  177.02  176.81  176.81  176.603020
2018-11-20 18:47:00  176.80  176.80  176.43  176.43  176.612706
2018-11-20 18:48:00  176.45  176.46  176.21  176.21  176.599967
2018-11-20 18:49:00  176.22  176.32  176.14  176.26  176.586624

2: 
2018-11-20 18:46:00  176.89  177.02  176.81  176.81  176.603020
2018-11-20 18:47:00  176.80  176.80  176.43  176.43  176.612706
2018-11-20 18:48:00  176.45  176.46  176.21  176.21  176.599967
2018-11-20 18:49:00  176.22  176.32  176.14  176.26  176.586624
2018-11-20 18:50:00  176.26  176.38  176.23  176.28  176.577114

班次是1分钟。

n: 
2018-11-20 18:55:00  176.29  176.92  176.11  176.91  176.518353
2018-11-20 18:56:00  176.92  177.03  176.67  176.76  176.554964
2018-11-20 18:57:00  176.78  176.89  176.74  176.76  176.566201
2018-11-20 18:58:00  176.77  176.87  176.56  176.65  176.571326
2018-11-20 18:59:00  176.65  177.17  176.59  176.94  176.681413

如何执行此操作?我尝试滚动,groupby没有成功。

pandas 0.23.4
Python 3.6.3

谢谢

2 个答案:

答案 0 :(得分:0)

如果仅根据所需的序列长度进行迭代,该怎么做。

# takes 5 row for each sub data frame
seq_len = 5
for i in range(0, len(data)):
    subdata = data.ix[i:i + int(seq_len), :]
    print(subdata)

答案 1 :(得分:0)

以下结果显示在请求的输出中(pandas 0.22.0,python 3.6.7):

dt

一个参数可以指定两个参数:时间窗口的宽度step和向前移动“滑动窗口”的df = pd.DataFrame([["2018-11-20 18:45:00", 176.73, 176.95, 176.54, 176.89, 176.582983], ["2018-11-20 18:46:00", 176.89, 177.02, 176.81, 176.81, 176.603020], ["2018-11-20 18:47:00", 176.80, 176.80, 176.43, 176.43, 176.612706], ["2018-11-20 18:48:00", 176.45, 176.46, 176.21, 176.21, 176.599967], ["2018-11-20 18:49:00", 176.22, 176.32, 176.14, 176.26, 176.586624], ["2018-11-20 18:50:00", 176.26, 176.38, 176.23, 176.28, 176.577114], ["2018-11-20 18:51:00", 176.31, 176.43, 176.20, 176.20, 176.562641], ["2018-11-20 18:52:00", 176.22, 176.25, 176.15, 176.18, 176.544664], ["2018-11-20 18:53:00", 176.19, 176.19, 175.97, 176.00, 176.506937], ["2018-11-20 18:54:00", 176.00, 176.30, 175.97, 176.30, 176.493768], ["2018-11-20 18:55:00", 176.29, 176.92, 176.11, 176.91, 176.518353], ["2018-11-20 18:56:00", 176.92, 177.03, 176.67, 176.76, 176.554964], ["2018-11-20 18:57:00", 176.78, 176.89, 176.74, 176.76, 176.566201], ["2018-11-20 18:58:00", 176.77, 176.87, 176.56, 176.65, 176.571326], ["2018-11-20 18:59:00", 176.65, 177.17, 176.59, 176.94, 176.681413],], columns=["date", "open", "high", "low", "close", "vwap"]) df = df.set_index("date") df.index = pd.to_datetime(df.index)

这种方法的优点是只使用索引操作,避免了不必要的重复数据副本(尽管我敢打赌python / pandas在尽可能避免这种情况方面做得很好,以防万一有人找到了替代方法完成工作)。

我使用以下数据框进行了测试:

pds = [int(s) for s in message.split() if s.isdigit()]
print(pds)