我在熊猫里有这些数据
data.tail(15)
open high low close vwap
date
2018-11-20 18:45:00 176.73 176.95 176.54 176.89 176.582983
2018-11-20 18:46:00 176.89 177.02 176.81 176.81 176.603020
2018-11-20 18:47:00 176.80 176.80 176.43 176.43 176.612706
2018-11-20 18:48:00 176.45 176.46 176.21 176.21 176.599967
2018-11-20 18:49:00 176.22 176.32 176.14 176.26 176.586624
2018-11-20 18:50:00 176.26 176.38 176.23 176.28 176.577114
2018-11-20 18:51:00 176.31 176.43 176.20 176.20 176.562641
2018-11-20 18:52:00 176.22 176.25 176.15 176.18 176.544664
2018-11-20 18:53:00 176.19 176.19 175.97 176.00 176.506937
2018-11-20 18:54:00 176.00 176.30 175.97 176.30 176.493768
2018-11-20 18:55:00 176.29 176.92 176.11 176.91 176.518353
2018-11-20 18:56:00 176.92 177.03 176.67 176.76 176.554964
2018-11-20 18:57:00 176.78 176.89 176.74 176.76 176.566201
2018-11-20 18:58:00 176.77 176.87 176.56 176.65 176.571326
2018-11-20 18:59:00 176.65 177.17 176.59 176.94 176.681413
我需要将子数据帧按5分组,例如:
1:
2018-11-20 18:45:00 176.73 176.95 176.54 176.89 176.582983
2018-11-20 18:46:00 176.89 177.02 176.81 176.81 176.603020
2018-11-20 18:47:00 176.80 176.80 176.43 176.43 176.612706
2018-11-20 18:48:00 176.45 176.46 176.21 176.21 176.599967
2018-11-20 18:49:00 176.22 176.32 176.14 176.26 176.586624
2:
2018-11-20 18:46:00 176.89 177.02 176.81 176.81 176.603020
2018-11-20 18:47:00 176.80 176.80 176.43 176.43 176.612706
2018-11-20 18:48:00 176.45 176.46 176.21 176.21 176.599967
2018-11-20 18:49:00 176.22 176.32 176.14 176.26 176.586624
2018-11-20 18:50:00 176.26 176.38 176.23 176.28 176.577114
班次是1分钟。
n:
2018-11-20 18:55:00 176.29 176.92 176.11 176.91 176.518353
2018-11-20 18:56:00 176.92 177.03 176.67 176.76 176.554964
2018-11-20 18:57:00 176.78 176.89 176.74 176.76 176.566201
2018-11-20 18:58:00 176.77 176.87 176.56 176.65 176.571326
2018-11-20 18:59:00 176.65 177.17 176.59 176.94 176.681413
如何执行此操作?我尝试滚动,groupby没有成功。
pandas 0.23.4
Python 3.6.3
谢谢
答案 0 :(得分:0)
如果仅根据所需的序列长度进行迭代,该怎么做。
# takes 5 row for each sub data frame
seq_len = 5
for i in range(0, len(data)):
subdata = data.ix[i:i + int(seq_len), :]
print(subdata)
答案 1 :(得分:0)
以下结果显示在请求的输出中(pandas 0.22.0,python 3.6.7):
dt
一个参数可以指定两个参数:时间窗口的宽度step
和向前移动“滑动窗口”的df = pd.DataFrame([["2018-11-20 18:45:00", 176.73, 176.95, 176.54, 176.89, 176.582983],
["2018-11-20 18:46:00", 176.89, 177.02, 176.81, 176.81, 176.603020],
["2018-11-20 18:47:00", 176.80, 176.80, 176.43, 176.43, 176.612706],
["2018-11-20 18:48:00", 176.45, 176.46, 176.21, 176.21, 176.599967],
["2018-11-20 18:49:00", 176.22, 176.32, 176.14, 176.26, 176.586624],
["2018-11-20 18:50:00", 176.26, 176.38, 176.23, 176.28, 176.577114],
["2018-11-20 18:51:00", 176.31, 176.43, 176.20, 176.20, 176.562641],
["2018-11-20 18:52:00", 176.22, 176.25, 176.15, 176.18, 176.544664],
["2018-11-20 18:53:00", 176.19, 176.19, 175.97, 176.00, 176.506937],
["2018-11-20 18:54:00", 176.00, 176.30, 175.97, 176.30, 176.493768],
["2018-11-20 18:55:00", 176.29, 176.92, 176.11, 176.91, 176.518353],
["2018-11-20 18:56:00", 176.92, 177.03, 176.67, 176.76, 176.554964],
["2018-11-20 18:57:00", 176.78, 176.89, 176.74, 176.76, 176.566201],
["2018-11-20 18:58:00", 176.77, 176.87, 176.56, 176.65, 176.571326],
["2018-11-20 18:59:00", 176.65, 177.17, 176.59, 176.94, 176.681413],],
columns=["date", "open", "high", "low", "close", "vwap"])
df = df.set_index("date")
df.index = pd.to_datetime(df.index)
。
这种方法的优点是只使用索引操作,避免了不必要的重复数据副本(尽管我敢打赌python / pandas在尽可能避免这种情况方面做得很好,以防万一有人找到了替代方法完成工作)。
我使用以下数据框进行了测试:
pds = [int(s) for s in message.split() if s.isdigit()]
print(pds)