假设我有一些服务器,他们会不断更新数据库及其状态。
我需要运行一些有关这些服务器状态的报告。对表格进行一些清理会有所帮助。
我为每条状态消息(开始时间和结束时间)获得2个时间戳。我想做的是接收具有相同状态的随后更新,并将其删除。我想更新结束时间以反映适当的间隔。
让我举例说明......
server_status表:
server | status | start_time | end_time
---------+------------+---------------------+---------------------
web1 | running | 2013-06-04 00:00:00 | 2013-06-04 00:05:00
web2 | down | 2013-06-04 00:01:00 | 2013-06-04 00:03:00
web1 | running | 2013-06-04 00:05:00 | 2013-06-04 01:00:00
msdb | idle | 2013-06-04 00:02:00 | 2013-06-04 02:00:00
web1 | running | 2013-06-04 01:00:00 | 2013-06-04 02:00:00
web2 | down | 2013-06-04 00:03:00 | 2013-06-04 03:00:00
web2 | running | 2013-06-04 03:00:00 | 2013-06-04 05:00:00
web1 | maintenance | 2013-06-04 02:00:00 | 2013-06-04 05:00:00
web1 | running | 2013-06-04 05:00:00 | 2013-06-04 07:00:00
我希望我的表最终看起来像这样(在start_time上排序):
server | status | start_time | end_time
---------+------------+---------------------+---------------------
web1 | running | 2013-06-04 00:00:00 | 2013-06-04 02:00:00
web2 | down | 2013-06-04 00:01:00 | 2013-06-04 03:00:00
msdb | idle | 2013-06-04 00:02:00 | 2013-06-04 02:00:00
web1 | maintenance | 2013-06-04 02:00:00 | 2013-06-04 05:00:00
web2 | running | 2013-06-04 03:00:00 | 2013-06-04 05:00:00
web1 | running | 2013-06-04 05:00:00 | 2013-06-05 07:00:00
这让我知道我的盒子何时改变状态,然后当我在这些表上运行一些报告时,我可以在SQL中查询BETWEEN start_time和end_time。
有任何线索如何做到这一点?我假设我需要更新语句,然后删除。如果需要,我可以添加行号,尽管它们目前不存在。这可能是必要的,因此我们可以排序然后检查第X行的服务器和状态在第X + 1行是否相同。
运行postgres 8.1(我知道,我知道。很快就会进入8.4)。
答案 0 :(得分:1)
这是一个棘手的问题,因为您有同一个(server, status)
的多组值,因此简单的GROUP BY
或DISTINCT (ON)
不会删除它。
然而,window function lag()
(自PostgreSQL 8.4 以来可用)非常适合您的问题,使解决方案非常简单。
要在SELECT
:
SELECT server, status, start_time, end_time
FROM (
SELECT *, status IS DISTINCT FROM
lag(status) OVER (PARTITION BY server ORDER BY start_time) AS step
FROM server_status
) sub
WHERE step
ORDER BY start_time;
传统版本:这也适用于 8.1 。仅用8.4测试 相关子查询可能比窗函数慢很多。
SELECT server, status, start_time, end_time
FROM server_status s
WHERE (
SELECT s1.status
FROM server_status s1
WHERE s1.server = s.server
AND s1.start_time < s.start_time
ORDER BY s1.start_time DESC
LIMIT 1
) IS DISTINCT FROM s.status
ORDER BY start_time;
->SQLfiddle for both
根据需要添加DELETE
行:
DELETE FROM server_status s
USING (
SELECT server, status, start_time
,status IS DISTINCT FROM
lag(status) OVER (PARTITION BY server ORDER BY start_time) AS step
FROM server_status
) d
WHERE s.server = d.server
AND s.status = d.status
AND s.start_time = d.start_time
AND NOT d.step;
8.1
。仅用8.4测试。
DELETE FROM server_status s
WHERE (
SELECT s1.status = s.status
FROM server_status s1
WHERE s1.server = s.server
AND s1.start_time < s.start_time
ORDER BY s1.start_time DESC
LIMIT 1
);
(server, start_time)
上的任何索引都会大大提高大型表的效果,这些查询的
您需要进行升级,仅出于安全原因。 (但为什么要停在8.4?直接进入当前版本。