我有一个Postgres表写每分钟int值(请求数) 我在某些服务器上有一些请求类型,所有这些都在同一个表上:
time | key1 | key2 | key3 | value
-----------------------------------------------------------------------
2017-01-16 18:00:53 | server1 | webpage1 | type1 | 30
2017-01-16 18:00:55 | server1 | webpage2 | type1 | 31
2017-01-16 18:00:58 | server1 | webpage3 | type1 | 32
2017-01-16 18:00:59 | server1 | webpage4 | type1 | 33
2017-01-16 18:01:00 | server1 | webpage5 | type1 | 34
2017-01-16 18:01:01 | server1 | webpage6 | type1 | 35
2017-01-16 18:01:02 | server1 | webpage7 | type1 | 36
2017-01-16 18:01:03 | server1 | webpage8 | type1 | 37
2017-01-16 18:01:04 | server1 | webpage1 | type1 | 56
2017-01-16 18:01:06 | server1 | webpage2 | type1 | 35
2017-01-16 18:01:07 | server1 | webpage3 | type1 | 43
2017-01-16 18:01:10 | server1 | webpage4 | type1 | 64
2017-01-16 18:01:13 | server1 | webpage5 | type1 | 44
2017-01-16 18:01:14 | server1 | webpage6 | type1 | 66
2017-01-16 18:01:16 | server1 | webpage7 | type1 | 56
2017-01-16 18:01:18 | server1 | webpage8 | type1 | 22
假设key1和key3也有不同的值(为了这个例子,我发出了一些数据)。
我需要将结果作为组(key1,key2,key3)上最新值的差异减去最新[我需要每分钟的速率]的1偏移量。
我在同一张表中成功得到了它的最新结果和1个偏移量(按键分组):
SELECT * FROM
(SELECT ROW_NUMBER()
OVER(PARTITION BY key1, key2, key3 ORDER BY time DESC) as rnum,
time, key1, key2, key3, value FROM test ORDER BY time DESC) a
WHERE rnum < 3;
结果是:
rnum | time | key1 | key2 | key3 | value
------+---------------------+---------+----------+-------+-------
1 | 2017-01-16 18:01:18 | server1 | webpage8 | type1 | 22
1 | 2017-01-16 18:01:16 | server1 | webpage7 | type1 | 56
1 | 2017-01-16 18:01:14 | server1 | webpage6 | type1 | 66
1 | 2017-01-16 18:01:13 | server1 | webpage5 | type1 | 44
1 | 2017-01-16 18:01:10 | server1 | webpage4 | type1 | 64
1 | 2017-01-16 18:01:07 | server1 | webpage3 | type1 | 43
1 | 2017-01-16 18:01:06 | server1 | webpage2 | type1 | 35
1 | 2017-01-16 18:01:04 | server1 | webpage1 | type1 | 56
2 | 2017-01-16 18:01:03 | server1 | webpage8 | type1 | 37
2 | 2017-01-16 18:01:02 | server1 | webpage7 | type1 | 36
2 | 2017-01-16 18:01:01 | server1 | webpage6 | type1 | 35
2 | 2017-01-16 18:01:00 | server1 | webpage5 | type1 | 34
2 | 2017-01-16 18:00:59 | server1 | webpage4 | type1 | 33
2 | 2017-01-16 18:00:58 | server1 | webpage3 | type1 | 32
2 | 2017-01-16 18:00:55 | server1 | webpage2 | type1 | 31
2 | 2017-01-16 18:00:53 | server1 | webpage1 | type1 | 30
现在,我想我可以取MIN(时间)和MAX(时间)的值列并计算差异,但我不能“合并”这些行。
在@HartCO评论之后,我能够做到这一点:
select time, new_val-last_val, key1, key2, key3 from
(select distinct max(time) over(partition by key1, key2, key3) as time,
max(value) over(partition by key1, key2, key3) as new_val,
min(value) over(partition by key1, key2, key3) as last_val,
key1, key2, key3
from (select row_number() over(partition by key1, key2, key3 order by time desc) as rnum,
time, key1, key2, key3, value from test order by time desc) a
where rnum < 3) b;
我得到了:
time | ?column? | key1 | key2 | key3
---------------------+----------+---------+----------+-------
2017-01-16 18:01:14 | 31 | server1 | webpage6 | type1
2017-01-16 18:01:18 | 15 | server1 | webpage8 | type1
2017-01-16 18:01:16 | 20 | server1 | webpage7 | type1
2017-01-16 18:01:04 | 26 | server1 | webpage1 | type1
2017-01-16 18:01:13 | 10 | server1 | webpage5 | type1
2017-01-16 18:01:06 | 4 | server1 | webpage2 | type1
2017-01-16 18:01:07 | 11 | server1 | webpage3 | type1
2017-01-16 18:01:10 | 31 | server1 | webpage4 | type1
但是网页8上的所需输出应该 -15 ,而不是22。
答案 0 :(得分:1)
使用lag()
and lead()
窗口函数最好处理偏移一定量的行之间的这些差异。要获得最新值you can use DISTINCT ON
combined with ORDER BY
,如果您的表格不是很大。请注意DISTINCT ON
是Postgresql扩展名。
SELECT DISTINCT ON (key1, key2, key3)
time,
key1,
key2,
key3,
value - lag(value) OVER (PARTITION BY key1, key2, key3 ORDER BY time)
FROM test
ORDER BY key1, key2, key3, time DESC;
这给了我们
time | key1 | key2 | key3 | ?column?
---------------------+------------+-------------+----------+----------
2017-01-16 18:01:04 | server1 | webpage1 | type1 | 26
2017-01-16 18:01:06 | server1 | webpage2 | type1 | 4
2017-01-16 18:01:07 | server1 | webpage3 | type1 | 11
2017-01-16 18:01:10 | server1 | webpage4 | type1 | 31
2017-01-16 18:01:13 | server1 | webpage5 | type1 | 10
2017-01-16 18:01:14 | server1 | webpage6 | type1 | 31
2017-01-16 18:01:16 | server1 | webpage7 | type1 | 20
2017-01-16 18:01:18 | server1 | webpage8 | type1 | -15
(8 rows)
当然,您可以使用other井known greatest-n-per-group解决方案,例如左连接。
WITH diffs AS (
SELECT time,
key1,
key2,
key3,
value - lag(value) OVER (PARTITION BY key1, key2, key3 ORDER BY time)
FROM test)
SELECT d1.*
FROM diffs d1
LEFT JOIN diffs d2
ON (d1.key1, d1.key2, d1.key3) = (d2.key1, d2.key2, d2.key3)
-- This allows us to single out the greatest row
AND d1.time < d2.time
WHERE d2.time IS NULL
-- Ordering is just for show
ORDER BY d1.key1, d1.key2, d1.key3;
使用Postgresql 9.5,规划器识别出这种模式,并使用反连接作为最终查询计划。您还可以使用NOT EXISTS
获得类似的结果。