我有一系列Event
s,由不同用户随着时间的推移而生成。
如何通过彼此接近的事件聚合此系列。如果出现以下两个事件(在同一窗口中):
b.user = a.user
and b.time >= a.time
and b.time - a.time <= interval '1 month'
这是递归条件。例如,以下数据集:
CREATE TABLE pg_temp.Data
("event" int, "user" int, "date" date, "value" int)
;
INSERT INTO pg_temp.Data
("event", "user", "date", "value")
VALUES
(1, 1, '2017-01-01', 5),
(2, 1, '2017-01-07', 3),
(3, 1, '2017-02-09', 2),
(4, 1, '2017-03-12', 4),
(5, 1, '2017-04-03', 7),
(6, 1, '2017-05-01', 6),
(7, 2, '2017-01-05', 9),
(8, 2, '2017-01-12', 1),
(9, 2, '2017-03-24', 6)
;
select * from pg_temp.Data
应简化为:
[
{
"init": "2017-01-01",
"latest": "2017-01-07",
"events": [
1,
2
],
"user": 1,
"value": 8
},
{
"init": "2017-02-09",
"latest": "2017-02-09",
"events": [
3
],
"user": 1,
"value": 2
},
{
"init": "2017-03-12",
"latest": "2017-05-01",
"events": [
4,
5,
6
],
"user": 1,
"value": 17
},
{
"init": "2017-01-05",
"latest": "2017-01-12",
"events": [
7,
8
],
"user": 2,
"value": 10
},
{
"init": "2017-03-24",
"latest": "2017-03-24",
"events": [
9
],
"user": 2,
"value": 6
}
]
其中init
和latest
是窗口的时间范围,value
是窗口中值的总和。
请注意,事件6
和4
相隔超过一个月,但由于事件5
介于它们之间,因此它们已汇总到同一组中。
答案 0 :(得分:3)
使用窗口功能:
SELECT min(date) AS init,
max(date) AS latest,
array_agg(event) AS events,
"user",
sum(value) AS value
FROM (SELECT event,
"user",
date,
value,
count(grp_start)
OVER (PARTITION BY "user" ORDER BY date) session_id
FROM (SELECT event,
"user",
date,
value,
CASE
WHEN date
> lag(date, 1, timestamp '-infinity')
OVER (PARTITION BY "user" ORDER BY date)
+ INTERVAL '1 month'
THEN 1
END grp_start
FROM data
) tagged
) numbered
GROUP BY "user", session_id
ORDER BY "user", init;
这将导致:
┌─────────────────────┬─────────────────────┬─────────┬──────┬───────┐
│ init │ latest │ events │ user │ value │
├─────────────────────┼─────────────────────┼─────────┼──────┼───────┤
│ 2017-01-01 00:00:00 │ 2017-01-07 00:00:00 │ {1,2} │ 1 │ 8 │
│ 2017-02-09 00:00:00 │ 2017-02-09 00:00:00 │ {3} │ 1 │ 2 │
│ 2017-03-12 00:00:00 │ 2017-05-01 00:00:00 │ {4,5,6} │ 1 │ 17 │
│ 2017-01-05 00:00:00 │ 2017-01-12 00:00:00 │ {7,8} │ 2 │ 10 │
│ 2017-03-24 00:00:00 │ 2017-03-24 00:00:00 │ {9} │ 2 │ 6 │
└─────────────────────┴─────────────────────┴─────────┴──────┴───────┘
(5 rows)
一句话af建议: 是一个好主意,使用user
这样的列名作为保留字。如果你忘了在双引号中使用它们,那么就会发生令人惊讶的事情(尝试一下)。