我对Python世界比较陌生,想知道是否有人可以给我一些指针来解决我的查询。
我有一个多索引数据框,示例如下:
**IDENTIFIER_VALUE TIME** DATE ASK PRICE BID PRICE ask bid
BE0000291972 08:17:14 19/02/2018 145.09 144.82 145.08 144.96
08:17:18 19/02/2018 145.09 144.95 145.08 144.96
08:17:18 19/02/2018 145.09 144.95 145.08 144.96
08:18:18 19/02/2018 145.09 144.95 145.08 144.96
08:22:18 19/02/2018 145.09 144.95 145.08 144.96
08:43:18 19/02/2018 145.09 144.95 145.08 144.96
08:51:18 19/02/2018 145.09 144.95 145.08 144.96
09:00:18 19/02/2018 145.09 144.95 145.08 144.96
09:06:18 19/02/2018 145.09 144.95 145.08 144.96
09:08:18 19/02/2018 145.09 144.95 145.08 144.96
09:15:18 19/02/2018 145.09 144.95 145.08 144.96
09:16:18 19/02/2018 145.09 144.95 145.08 144.96
09:27:18 19/02/2018 145.09 144.95 145.08 144.96
09:28:18 19/02/2018 145.09 144.94 145.08 144.96
09:42:18 19/02/2018 145.09 144.94 145.08 144.96
09:44:18 19/02/2018 145.09 144.94 145.08 144.96
BE0000337460 10:45:04 19/02/2018 102.12 102.06 102.11 102.06
11:04:04 19/02/2018 102.12 102.06 102.11 102.06
11:23:04 19/02/2018 102.12 102.06 102.11 102.06
11:31:04 19/02/2018 102.12 102.06 102.11 102.06
11:43:04 19/02/2018 102.12 102.06 102.11 102.06
11:57:04 19/02/2018 102.12 102.06 102.11 102.06
12:04:04 19/02/2018 102.12 102.06 102.11 102.06
12:14:04 19/02/2018 102.12 102.06 102.11 102.06
12:41:04 19/02/2018 102.12 102.06 102.11 102.06
12:50:04 19/02/2018 102.12 102.06 102.11 102.06
12:57:04 19/02/2018 102.12 102.06 102.11 102.06
13:08:04 19/02/2018 102.12 102.06 102.11 102.06
13:11:04 19/02/2018 102.12 102.06 102.11 102.06
13:33:04 19/02/2018 102.12 102.06 102.11 102.06
13:48:04 19/02/2018 102.12 102.06 102.11 102.06
14:03:04 19/02/2018 102.12 102.06 102.11 102.06
问题:我想执行以下操作:
对于索引0级的每个值,请执行以下操作: 1.从第一条记录开始。例如:
IDENTIFIER_VALUE TIME DATE ASK PRICE BID PRICE ask bid
BE0000291972 08:17:14 19/02/2018 145.09 144.82 145.08 144.96
在接下来的7分钟内查找与第一条记录上的时间相关的所有记录。所以基于上面我想从8:17:14 - 8:24:17选择所有记录以获得相同的安全性。所以它应该给我以下内容:
IDENTIFIER_VALUE TIME DATE ASK PRICE BID PRICE要求出价 BE0000291972 08:17:14 19/02/2018 145.09 144.82 145.08 144.96 08:17:18 19/02/2018 145.09 144.95 145.08 144.96 08:17:18 19/02/2018 145.09 144.95 145.08 144.96 08:18:18 19/02/2018 145.09 144.95 145.08 144.96 08:22:18 19/02/2018 145.09 144.95 145.08 144.96
想要对此数据集进行一些计算。
使用第二条记录重复步骤1-3。重复直到数据集中所有记录的结尾。
Groupby / resample会合并时间频率的记录,但这不是我要找的。我想针对数据帧中的每个条目每7分钟提取一次记录,并对其进行计算。希望我能够解释我想要的东西。
提前致谢
答案 0 :(得分:0)
为了便于说明,这里是一个不需要第三方库的通用解决方案:
from collections import deque
from datetime import datetime, timedelta
def sliding_window(it, size, dist=(lambda a, b: b - a), index=(lambda x: x)):
q = deque()
for item in it:
i = index(item)
while q and dist(q[0][0], i) >= size:
yield [x for _, x in q]
q.popleft()
q.append((i, item))
if q:
yield [x for _, x in q]
if __name__ == '__main__':
import pprint
pprint.pprint(list(sliding_window(
[
{ "timestamp": "2018-01-18T04:00:00Z" },
{ "timestamp": "2018-01-18T04:00:01Z" },
{ "timestamp": "2018-01-18T04:03:00Z" },
{ "timestamp": "2018-01-18T04:04:00Z" },
{ "timestamp": "2018-01-18T04:10:00Z" },
{ "timestamp": "2018-01-18T04:24:00Z" },
{ "timestamp": "2018-01-18T04:28:00Z" }
],
timedelta(minutes=7),
index=lambda x: datetime.strptime(x["timestamp"], "%Y-%m-%dT%H:%M:%SZ")
)))
输出:
[[{'timestamp': '2018-01-18T04:00:00Z'},
{'timestamp': '2018-01-18T04:00:01Z'},
{'timestamp': '2018-01-18T04:03:00Z'},
{'timestamp': '2018-01-18T04:04:00Z'}],
[{'timestamp': '2018-01-18T04:00:01Z'},
{'timestamp': '2018-01-18T04:03:00Z'},
{'timestamp': '2018-01-18T04:04:00Z'}],
[{'timestamp': '2018-01-18T04:03:00Z'}, {'timestamp': '2018-01-18T04:04:00Z'}],
[{'timestamp': '2018-01-18T04:04:00Z'}, {'timestamp': '2018-01-18T04:10:00Z'}],
[{'timestamp': '2018-01-18T04:10:00Z'}],
[{'timestamp': '2018-01-18T04:24:00Z'}, {'timestamp': '2018-01-18T04:28:00Z'}]]