我有桌子
CREATE TABLE t1
(
id serial NOT NULL,
in_quantity bigint NULL,
price money NOT NULL,
out_quantity bigint NULL,
stamp timestamp NOT NULL
);
有这样的数据,例如(日期相同,但不是时间)
INSERT INTO t1 (in_quantity, price, out_quantity, stamp)
VALUES
( 100, 10.00, NULL, '2014-10-20 00:00:00'), -- id = 1
( 200, 11.00, NULL, '2014-10-20 00:01:00'), -- id = 2
( 300, 12.00, NULL, '2014-10-20 00:02:00'), -- id = 3
(NULL, 13.00, 400, '2014-10-20 00:03:00'), -- id = 4
(NULL, 14.00, 500, '2014-10-20 00:04:00'), -- id = 5
( 600, 15.00, NULL, '2014-10-20 00:15:00'), -- id = 6
( 700, 16.00, NULL, '2014-10-20 00:16:00'), -- id = 7
( 800, 17.00, NULL, '2014-10-20 00:17:00'), -- id = 8
(NULL, 18.00, 900, '2014-10-20 00:18:00'), -- id = 9
(NULL, 19.00, 1000, '2014-10-20 00:19:00'), -- id = 10
(2300, 23.00, NULL, '2014-10-20 00:23:00'), -- id = 11
(2400, 24.00, NULL, '2014-10-20 00:24:00'); -- id = 12
我需要从此表中获取行,其中包含特定集合中每个日期范围的最大输入和输出数量。 例如:
( "2014-10-20 00:00:00" : "2014-10-20 00:05:00" ]
( "2014-10-20 00:05:00" : "2014-10-20 00:10:00" ]
( "2014-10-20 00:10:00" : "2014-10-20 00:15:00" ]
( "2014-10-20 00:15:00" : "2014-10-20 00:20:00" ]
( "2014-10-20 00:20:00" : "2014-10-20 00:25:00" ]
这个例子我想要的结果是
interval begin | interval end | max_in_q | max_in_q_id | max_out_q | max_out_q_id
======================+=======================+==========+=============+===========+=============
"2014-10-20 00:00:00" | "2014-10-20 00:05:00" | 300 | 3 | 400 | 4
"2014-10-20 00:05:00" | "2014-10-20 00:10:00" | NULL | NULL | NULL | NULL
"2014-10-20 00:10:00" | "2014-10-20 00:15:00" | NULL | NULL | NULL | NULL
"2014-10-20 00:15:00" | "2014-10-20 00:20:00" | 800 | 8 | 1000 | 10
"2014-10-20 00:20:00" | "2014-10-20 00:25:00" | 2400 | 12 | NULL | NULL
因此。我可以使用像这样的查询生成类似的集合
SELECT
i::timestamp AS dleft,
i::timestamp + '1 hour' AS dright
FROM
generate_series('2014-10-20 00:00:00'::timestamp, '2014-10-20 23:00:00'::timestamp, '1 hour') AS i
但是我无法想象如何为这些小范围中的每一个运行聚合函数以及如何加入结果。
答案 0 :(得分:2)
首先,您需要意识到,在任何RDBMS中,每个聚合值都需要id
s,这不是一个简单的查询。
这个问题主要通过PostgreSQL中的DISTINCT ON
来解决:
SELECT DISTINCT ON (s)
s ts_start, s + '5 minutes' ts_end, in_quantity max_in_q, id max_in_id
FROM
generate_series('2014-10-20 00:00:00'::timestamp, '2014-10-20 00:20:00'::timestamp, '5 minutes') s
LEFT JOIN
t1 ON stamp <@ tsrange(s, s + '5 minutes', '(]')
ORDER BY
s, in_quantity DESC NULLS LAST;
但这只允许你选择一个最大/最小值,以及它们所属的整行。
如果你真的需要两个最大列,你需要编写自联接和子查询,这不会那么快:
SELECT
lower(r) ts_start, upper(r) ts_end, max_in_q, max_in.id max_in_id, max_out_q, max_out.id max_out_id
FROM (
SELECT
r, max(in_quantity) max_in_q, max(out_quantity) max_out_q
FROM
generate_series('2014-10-20 00:00:00'::timestamp, '2014-10-20 00:20:00'::timestamp, '5 minutes') s,
tsrange(s, s + '5 minutes', '(]') r
LEFT JOIN
t1 ON stamp <@ r
GROUP BY
r
ORDER BY
r
) m
LEFT JOIN
t1 max_in ON max_in.in_quantity = max_in_q
LEFT JOIN
t1 max_out ON max_out.out_quantity = max_out_q;
注意:对于第二个版本,您需要自己处理重复项,因为in_quantity
和out_quantity
并不是唯一的。
答案 1 :(得分:1)
我认为在range type:
的帮助下,这可能非常简单WITH data(in_quantity,price,out_quantity,stamp) AS (VALUES
( 100::int8, 10.00, NULL::int8, '2014-10-20 00:00:00'::timestamp), -- id = 1
( 200, 11.00, NULL, '2014-10-20 00:01:00'), -- id = 2
( 300, 12.00, NULL, '2014-10-20 00:02:00'), -- id = 3
(NULL, 13.00, 400, '2014-10-20 00:03:00'), -- id = 4
(NULL, 14.00, 500, '2014-10-20 00:04:00'), -- id = 5
( 600, 15.00, NULL, '2014-10-20 00:15:00'), -- id = 6
( 700, 16.00, NULL, '2014-10-20 00:16:00'), -- id = 7
( 800, 17.00, NULL, '2014-10-20 00:17:00'), -- id = 8
(NULL, 18.00, 900, '2014-10-20 00:18:00'), -- id = 9
(NULL, 19.00, 1000, '2014-10-20 00:19:00'), -- id = 10
(2300, 23.00, NULL, '2014-10-20 00:23:00'), -- id = 11
(2400, 24.00, NULL, '2014-10-20 00:24:00')
)
SELECT
tsrange(i,i+INTERVAL '1h','[)') r,
max(in_quantity) max_in_q,
max(out_quantity) max_out_q
FROM generate_series('2014-10-20 00:00:00'::timestamp,
'2014-10-20 23:00:00'::timestamp, '1 hour') AS i
LEFT JOIN data d ON tsrange(i,i+INTERVAL '1h','[)') @> d.stamp
GROUP BY r
ORDER BY r;
我在这里使用了LEFT JOIN
因为我认为您希望看到所有范围,以满足您的需求。