如果序列未被破坏,则从多行获取总时间间隔

时间:2011-08-30 13:40:20

标签: sql postgresql datetime gaps-and-islands

我有WorkPerson个表格(这些只是了解问题的示例)。

结构

Work

id INTEGER
person_id INTEGER
dt_from DATETIME
dt_to DATETIME

Person

person_id INTEGER
name VARCHAR(50)

数据

Work

id | person_id | dt_from    | dt_to
-------------------------------------------------
1  | 1         | 2011-01-01 | 2011-02-02
2  | 1         | 2011-02-02 | 2011-04-04
3  | 1         | 2011-06-06 | 2011-09-09
4  | 2         | 2011-01-01 | 2011-02-02
5  | 2         | 2011-02-02 | 2011-03-03
....etc.

Person

只是人名为

的人名

预期输出

Person 1 : 2011-01-01 - 2011-04-04
Person 1 : 2011-06-06 - 2011-09-09
Person 2 : 2011-01-01 - 2011-03-03

间隔必须按顺序排列。它不能在中间的某个地方被打破。这就是为什么人1有两个间隔。

如果它改变了某些东西,我正在使用postgres。你有什么想法吗? 我想在一个查询中执行它,但如果没有这样的解决方案,我将在php中进行一些间隔合并。

2 个答案:

答案 0 :(得分:1)

在一个SQL select中可能有一种方法可以做到这一点,但它让我感到厌烦。我设法用一个存储的函数做到了。这是我为测试所做的:

create table work
(id integer, start_date date, end_date date);

insert into work values (1, '2011-01-01','2011-02-02');
insert into work values (1, '2011-02-02','2011-04-04');
insert into work values (1, '2011-06-06','2011-09-09');
insert into work values (2, '2011-01-01','2011-02-02');
insert into work values (2, '2011-02-02','2011-03-03');

create or replace function get_data() returns setof work as
$body$
declare
    res work%rowtype;
    sd  date := null;
begin
    for res in
        select
            w1.id,
            case when exists (select 1 from work w2 where w1.id=w2.id and w2.end_date=w1.start_date) then null else w1.start_date end,
            case when exists (select 1 from work w2 where w1.id=w2.id and w2.start_date=w1.end_date) then null else w1.end_date end
        from
            work w1
        order by
            id, start_date, end_date
    loop
        if res.start_date is not null and res.end_date is not null then
            return next res;
        elsif res.start_date is not null then
            sd := res.start_date;
        elsif res.end_date is not null then
            res.start_date := sd;
            return next res;
        end if;
    end loop;

    return;
end;$body$
  language 'plpgsql';

然后

select * from get_data() order by id, start_date;

返回了这个结果:

 id | start_date |  end_date
----+------------+------------
  1 | 2011-01-01 | 2011-04-04
  1 | 2011-06-06 | 2011-09-09
  2 | 2011-01-01 | 2011-03-03
(3 rows)

我认为,这就是你所追求的目标。

答案 1 :(得分:0)

你可以试试postgres的WITH RECURSIVE结构。 (毕竟,链表是一种树)获得正确的边界条件将是一个问题,但至少它可以在不需要循环的情况下解决问题。

更新:添加了代码。 RECURSIVE的问题在于您只能指定“尾部”边界条件。要指定“head”条件,需要将其包装到视图中。

CREATE VIEW collected_time AS (
    WITH RECURSIVE ztree(person_id, dt_from, dt_to)  AS ( 
    -- Terminal part
    SELECT  pr.person_id, pr.dt_from, pr.dt_to
    FROM prikklok pr
    WHERE NOT EXISTS (
            SELECT * FROM prikklok px
            WHERE px.person_id = pr.person_id AND px.dt_from = pr.dt_to
            )
    UNION
    -- Recursive part
    SELECT  p1.person_id AS person_id
    , p1.dt_from AS dt_from
    , p2.dt_to AS dt_to
    FROM prikklok AS p1
    , ztree AS p2
    WHERE p1.person_id = p2.person_id
    AND p1.dt_to = p2.dt_from
    )
SELECT *
FROM ztree zt
WHERE NOT EXISTS (select *
    FROM prikklok p3
    WHERE p3.person_id = zt.person_id
        AND p3.dt_to = zt.dt_from
            )
    );

SELECT * FROM collected_time;

- 现在生成一些有差距的数据

INSERT INTO prikklok
    SELECT serie_n
    , serie_t
    , serie_t + '1 month'::interval
FROM generate_series (1,10) serie_n
    , generate_series ( '1970-01-01 00:00:00' , '2011-09-01 00:00:00' , '1 month' ::interval) serie_t
    ;

DELETE FROM prikklok
WHERE random() <  0.001
    ;

-- a few indexes won't hurt
ALTER TABLE prikklok ADD PRIMARY KEY (person_id,dt_from)
    ;
CREATE UNIQUE INDEX ON prikklok (person_id,dt_to);

生成的查询计划看起来很完美:

                                                                              QUERY PLAN                                                                           
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Anti Join  (cost=1389.73..1469.09 rows=1 width=20) (actual time=13.580..40.920 rows=16 loops=1)
   CTE ztree
     ->  Recursive Union  (cost=0.00..1389.73 rows=11 width=20) (actual time=0.136..27.405 rows=5004 loops=1)
           ->  Merge Anti Join  (cost=0.00..638.92 rows=1 width=20) (actual time=0.130..10.011 rows=16 loops=1)
                 Merge Cond: ((pr.person_id = px.person_id) AND (pr.dt_to = px.dt_from))
                 ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok pr  (cost=0.00..291.31 rows=5004 width=20) (actual time=0.063..2.273 rows=5004 loops=1)
                 ->  Index Scan using prikklok_pkey on prikklok px  (cost=0.00..291.31 rows=5004 width=12) (actual time=0.012..2.204 rows=5004 loops=1)
           ->  Nested Loop  (cost=0.00..75.06 rows=1 width=20) (actual time=0.002..0.027 rows=10 loops=501)
                 ->  WorkTable Scan on ztree p2  (cost=0.00..0.20 rows=10 width=20) (actual time=0.000..0.001 rows=10 loops=501)
                 ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok p1  (cost=0.00..7.47 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=5004)
                       Index Cond: ((p1.person_id = p2.person_id) AND (p1.dt_to = p2.dt_from))
   ->  CTE Scan on ztree zt  (cost=0.00..0.22 rows=11 width=20) (actual time=0.138..29.887 rows=5004 loops=1)
   ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok p3  (cost=0.00..7.18 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=5004)
         Index Cond: ((p3.person_id = zt.person_id) AND (p3.dt_to = zt.dt_from))
 Total runtime: 41.354 ms
(15 rows)