Python:Multiindex递归计算

时间:2018-02-27 15:11:58

标签: python multi-index

我对Python世界比较陌生,想知道是否有人可以给我一些指针来解决我的查询。

我有一个多索引数据框,示例如下:

**IDENTIFIER_VALUE  TIME**  DATE    ASK PRICE   BID PRICE   ask bid
BE0000291972    08:17:14    19/02/2018  145.09  144.82  145.08  144.96
    08:17:18    19/02/2018  145.09  144.95  145.08  144.96
    08:17:18    19/02/2018  145.09  144.95  145.08  144.96
    08:18:18    19/02/2018  145.09  144.95  145.08  144.96
    08:22:18    19/02/2018  145.09  144.95  145.08  144.96
    08:43:18    19/02/2018  145.09  144.95  145.08  144.96
    08:51:18    19/02/2018  145.09  144.95  145.08  144.96
    09:00:18    19/02/2018  145.09  144.95  145.08  144.96
    09:06:18    19/02/2018  145.09  144.95  145.08  144.96
    09:08:18    19/02/2018  145.09  144.95  145.08  144.96
    09:15:18    19/02/2018  145.09  144.95  145.08  144.96
    09:16:18    19/02/2018  145.09  144.95  145.08  144.96
    09:27:18    19/02/2018  145.09  144.95  145.08  144.96
    09:28:18    19/02/2018  145.09  144.94  145.08  144.96
    09:42:18    19/02/2018  145.09  144.94  145.08  144.96
    09:44:18    19/02/2018  145.09  144.94  145.08  144.96
BE0000337460    10:45:04    19/02/2018  102.12  102.06  102.11  102.06
    11:04:04    19/02/2018  102.12  102.06  102.11  102.06
    11:23:04    19/02/2018  102.12  102.06  102.11  102.06
    11:31:04    19/02/2018  102.12  102.06  102.11  102.06
    11:43:04    19/02/2018  102.12  102.06  102.11  102.06
    11:57:04    19/02/2018  102.12  102.06  102.11  102.06
    12:04:04    19/02/2018  102.12  102.06  102.11  102.06
    12:14:04    19/02/2018  102.12  102.06  102.11  102.06
    12:41:04    19/02/2018  102.12  102.06  102.11  102.06
    12:50:04    19/02/2018  102.12  102.06  102.11  102.06
    12:57:04    19/02/2018  102.12  102.06  102.11  102.06
    13:08:04    19/02/2018  102.12  102.06  102.11  102.06
    13:11:04    19/02/2018  102.12  102.06  102.11  102.06
    13:33:04    19/02/2018  102.12  102.06  102.11  102.06
    13:48:04    19/02/2018  102.12  102.06  102.11  102.06
    14:03:04    19/02/2018  102.12  102.06  102.11  102.06

问题:我想执行以下操作:

对于索引0级的每个值,请执行以下操作: 1.从第一条记录开始。例如:

IDENTIFIER_VALUE    TIME    DATE    ASK PRICE   BID PRICE   ask bid
BE0000291972    08:17:14    19/02/2018  145.09  144.82  145.08  144.96
  1. 在接下来的7分钟内查找与第一条记录上的时间相关的所有记录。所以基于上面我想从8:17:14 - 8:24:17选择所有记录以获得相同的安全性。所以它应该给我以下内容:

    IDENTIFIER_VALUE TIME DATE ASK PRICE BID PRICE要求出价 BE0000291972 08:17:14 19/02/2018 145.09 144.82 145.08 144.96     08:17:18 19/02/2018 145.09 144.95 145.08 144.96     08:17:18 19/02/2018 145.09 144.95 145.08 144.96     08:18:18 19/02/2018 145.09 144.95 145.08 144.96     08:22:18 19/02/2018 145.09 144.95 145.08 144.96

  2. 想要对此数据集进行一些计算。

  3. 使用第二条记录重复步骤1-3。重复直到数据集中所有记录的结尾。

  4. Groupby / resample会合并时间频率的记录,但这不是我要找的。我想针对数据帧中的每个条目每7分钟提取一次记录,并对其进行计算。希望我能够解释我想要的东西。

    提前致谢

1 个答案:

答案 0 :(得分:0)

为了便于说明,这里是一个不需要第三方库的通用解决方案:

from collections import deque
from datetime import datetime, timedelta


def sliding_window(it, size, dist=(lambda a, b: b - a), index=(lambda x: x)):
    q = deque()
    for item in it:
        i = index(item)
        while q and dist(q[0][0], i) >= size:
            yield [x for _, x in q]
            q.popleft()

        q.append((i, item))

    if q:
        yield [x for _, x in q]


if __name__ == '__main__':
    import pprint
    pprint.pprint(list(sliding_window(
        [
            { "timestamp": "2018-01-18T04:00:00Z" },
            { "timestamp": "2018-01-18T04:00:01Z" },
            { "timestamp": "2018-01-18T04:03:00Z" },
            { "timestamp": "2018-01-18T04:04:00Z" },
            { "timestamp": "2018-01-18T04:10:00Z" },
            { "timestamp": "2018-01-18T04:24:00Z" },
            { "timestamp": "2018-01-18T04:28:00Z" }
        ],
        timedelta(minutes=7),
        index=lambda x: datetime.strptime(x["timestamp"], "%Y-%m-%dT%H:%M:%SZ")
    )))

输出:

[[{'timestamp': '2018-01-18T04:00:00Z'},
  {'timestamp': '2018-01-18T04:00:01Z'},
  {'timestamp': '2018-01-18T04:03:00Z'},
  {'timestamp': '2018-01-18T04:04:00Z'}],
 [{'timestamp': '2018-01-18T04:00:01Z'},
  {'timestamp': '2018-01-18T04:03:00Z'},
  {'timestamp': '2018-01-18T04:04:00Z'}],
 [{'timestamp': '2018-01-18T04:03:00Z'}, {'timestamp': '2018-01-18T04:04:00Z'}],
 [{'timestamp': '2018-01-18T04:04:00Z'}, {'timestamp': '2018-01-18T04:10:00Z'}],
 [{'timestamp': '2018-01-18T04:10:00Z'}],
 [{'timestamp': '2018-01-18T04:24:00Z'}, {'timestamp': '2018-01-18T04:28:00Z'}]]