使用SQL更新状态时间跨度并删除不需要的行

时间:2013-06-04 15:07:54

标签: sql postgresql timestamp delete-row timespan

假设我有一些服务器,他们会不断更新数据库及其状态。

我需要运行一些有关这些服务器状态的报告。对表格进行一些清理会有所帮助。

我为每条状态消息(开始时间和结束时间)获得2个时间戳。我想做的是接收具有相同状态的随后更新,并将其删除。我想更新结束时间以反映适当的间隔。

让我举例说明......

server_status表:

server   |    status    |     start_time      |       end_time
---------+------------+---------------------+---------------------
 web1    |  running     | 2013-06-04 00:00:00 | 2013-06-04 00:05:00
 web2    |  down        | 2013-06-04 00:01:00 | 2013-06-04 00:03:00
 web1    |  running     | 2013-06-04 00:05:00 | 2013-06-04 01:00:00
 msdb    |  idle        | 2013-06-04 00:02:00 | 2013-06-04 02:00:00
 web1    |  running     | 2013-06-04 01:00:00 | 2013-06-04 02:00:00
 web2    |  down        | 2013-06-04 00:03:00 | 2013-06-04 03:00:00
 web2    |  running     | 2013-06-04 03:00:00 | 2013-06-04 05:00:00
 web1    |  maintenance | 2013-06-04 02:00:00 | 2013-06-04 05:00:00
 web1    |  running     | 2013-06-04 05:00:00 | 2013-06-04 07:00:00

我希望我的表最终看起来像这样(在start_time上排序):

server   |    status    |     start_time      |       end_time
---------+------------+---------------------+---------------------
 web1    |  running     | 2013-06-04 00:00:00 | 2013-06-04 02:00:00
 web2    |  down        | 2013-06-04 00:01:00 | 2013-06-04 03:00:00
 msdb    |  idle        | 2013-06-04 00:02:00 | 2013-06-04 02:00:00
 web1    |  maintenance | 2013-06-04 02:00:00 | 2013-06-04 05:00:00
 web2    |  running     | 2013-06-04 03:00:00 | 2013-06-04 05:00:00
 web1    |  running     | 2013-06-04 05:00:00 | 2013-06-05 07:00:00

这让我知道我的盒子何时改变状态,然后当我在这些表上运行一些报告时,我可以在SQL中查询BETWEEN start_time和end_time。

有任何线索如何做到这一点?我假设我需要更新语句,然后删除。如果需要,我可以添加行号,尽管它们目前不存在。这可能是必要的,因此我们可以排序然后检查第X行的服务器和状态在第X + 1行是否相同。

运行postgres 8.1(我知道,我知道。很快就会进入8.4)。

1 个答案:

答案 0 :(得分:1)

这是一个棘手的问题,因为您有同一个(server, status)的多组值,因此简单的GROUP BYDISTINCT (ON)不会删除它。

然而,window function lag()(自PostgreSQL 8.4 以来可用)非常适合您的问题,使解决方案非常简单。

要在SELECT

中获取您要查找的值
SELECT server, status, start_time, end_time
FROM  (
   SELECT *, status IS DISTINCT FROM 
             lag(status) OVER (PARTITION BY server ORDER BY start_time) AS step
   FROM   server_status
   ) sub
WHERE  step
ORDER  BY start_time;

传统版本:这也适用于 8.1 。仅用8.4测试 相关子查询可能比窗函数慢很多。

SELECT server, status, start_time, end_time
FROM   server_status s
WHERE ( 
   SELECT s1.status
   FROM   server_status s1
   WHERE  s1.server = s.server
   AND    s1.start_time < s.start_time
   ORDER  BY s1.start_time DESC
   LIMIT  1
   ) IS DISTINCT FROM s.status
ORDER  BY start_time;

->SQLfiddle for both
根据需要添加DELETE行:

DELETE FROM server_status s
USING (
   SELECT server, status, start_time
         ,status IS DISTINCT FROM
          lag(status) OVER (PARTITION BY server ORDER BY start_time) AS step
   FROM   server_status
   ) d
WHERE  s.server = d.server
AND    s.status = d.status
AND    s.start_time = d.start_time
AND    NOT d.step;

8.1。仅用8.4测试。

DELETE FROM server_status s
WHERE (   
   SELECT s1.status = s.status
   FROM   server_status s1
   WHERE  s1.server = s.server
   AND    s1.start_time < s.start_time
   ORDER  BY s1.start_time DESC
   LIMIT  1
   );

(server, start_time)上的任何索引都会大大提高大型表的效果,这些查询的

需要进行升级,仅出于安全原因。 (但为什么要停在8.4?直接进入当前版本。