累积的串联/ array_aggregate大熊猫

时间:2018-11-09 16:51:11

标签: python arrays pandas aggregate

我有某种带有ID,键和值的事件队列。
我想将此表按键分组,并为每一行汇总该键的所有先前值
(类似于cumsum,但array_aggregate)

我知道如何使用SQL:

WITH t AS (
  SELECT *
  FROM (
         VALUES
           (1, 'A', 1),
           (2, 'B', 1),
           (3, 'A', 2),
           (4, 'A', 3),
           (5, 'A', 5),
           (6, 'B', 8)
       ) AS v(id, key, val)
) SELECT
    *,
    array_agg(val)
    OVER (
      PARTITION BY key
      ORDER BY id )
  FROM t
  ORDER BY id

将导致:

id, key, val, array_agg
1,  A,   1,   {1}
2,  B,   1,   {1}
3,  A,   2,   {1,2}
4,  A,   3,   {1,2,3}
5,  A,   5,   {1,2,3,5}
6,  B,   8,   {1,8}

如果我有相同的表,在python中做的最好方法是什么?

import pandas as pd

df = pd.DataFrame([
    (1, 'A', 1),
    (2, 'B', 1),
    (3, 'A', 2),
    (4, 'A', 3),
    (5, 'A', 5),
    (6, 'B', 8)
], columns=['id', 'key', 'val'])

0 个答案:

没有答案