缺少自行内连接SQL查询的行

时间:2013-04-27 14:27:07

标签: sql postgresql inner-join

我正在使用PostgreSQL 8.3.8。

我有一个时间边界列表(按日期),在time_boundaries表中:

CREATE TABLE role_times_boundaries
(
  role_date DATE,
  time_boundary TIME
);

INSERT INTO role_times_boundaries (role_date, time_boundary) VALUES ('2013-04-24'::date, '09:00:00'::time);
INSERT INTO role_times_boundaries (role_date, time_boundary) VALUES ('2013-04-24'::date, '10:00:00'::time);
INSERT INTO role_times_boundaries (role_date, time_boundary) VALUES ('2013-04-25'::date, '07:00:00'::time);
INSERT INTO role_times_boundaries (role_date, time_boundary) VALUES ('2013-04-25'::date, '08:50:00'::time);
INSERT INTO role_times_boundaries (role_date, time_boundary) VALUES ('2013-04-25'::date, '09:00:00'::time);
INSERT INTO role_times_boundaries (role_date, time_boundary) VALUES ('2013-04-25'::date, '12:00:00'::time);
INSERT INTO role_times_boundaries (role_date, time_boundary) VALUES ('2013-04-25'::date, '13:00:00'::time);
INSERT INTO role_times_boundaries (role_date, time_boundary) VALUES ('2013-04-25'::date, '16:00:00'::time);
INSERT INTO role_times_boundaries (role_date, time_boundary) VALUES ('2013-04-25'::date, '17:30:00'::time);
INSERT INTO role_times_boundaries (role_date, time_boundary) VALUES ('2013-04-25'::date, '20:00:00'::time);

所以,我有这个表内容:

 role_date  | time_boundary 
------------+---------------
 2013-04-24 | 09:00:00
 2013-04-24 | 10:00:00
 2013-04-25 | 07:00:00
 2013-04-25 | 08:50:00
 2013-04-25 | 09:00:00
 2013-04-25 | 12:00:00
 2013-04-25 | 13:00:00
 2013-04-25 | 16:00:00
 2013-04-25 | 17:30:00
 2013-04-25 | 20:00:00

目标

我想通过将每个time_boundary作为“start_time”,并将下一个time_boundary(按顺序)作为同一日期,在“role_times_boundaries”上进行自我内部联接来构建“时间片列表”表。 目标是获得这样的结果:

 role_date  | start_time | end_time 
------------+------------+----------
 2013-04-24 | 09:00:00   | 10:00:00
 2013-04-25 | 07:00:00   | 08:50:00
 2013-04-25 | 08:50:00   | 09:00:00
 2013-04-25 | 09:00:00   | 12:00:00
 2013-04-25 | 12:00:00   | 13:00:00
 2013-04-25 | 13:00:00   | 16:00:00
 2013-04-25 | 16:00:00   | 17:30:00
 2013-04-25 | 17:30:00   | 20:00:00

暂定

我试图通过这个SQL查询获得希望的结果

SELECT role_times_boundaries.role_date,
       role_times_boundaries.time_boundary AS start_time,
       end_time_boundaries.time_boundary AS end_time
FROM role_times_boundaries
INNER JOIN (
             SELECT role_date,
                    time_boundary
             FROM role_times_boundaries
           ) AS end_time_boundaries ON (
                                       role_times_boundaries.role_date = end_time_boundaries.role_date
                                       AND end_time_boundaries.time_boundary = (
                                                                                  SELECT MIN(a_list_of_end_boundaries.time_boundary)
                                                                                  FROM role_times_boundaries AS a_list_of_end_boundaries
                                                                                  WHERE a_list_of_end_boundaries.time_boundary > role_times_boundaries.time_boundary
                                                                                )
                                     )

结果如下:

 role_date  | start_time | end_time 
------------+------------+----------
 2013-04-24 | 09:00:00   | 10:00:00
 2013-04-25 | 07:00:00   | 08:50:00
 2013-04-25 | 08:50:00   | 09:00:00
 2013-04-25 | 12:00:00   | 13:00:00
 2013-04-25 | 13:00:00   | 16:00:00
 2013-04-25 | 16:00:00   | 17:30:00
 2013-04-25 | 17:30:00   | 20:00:00

如果你看得见,那么 09:00:00到12:00:00 的时间片就会丢失! 但我仍然不明白为什么,仍然没有找到我的错误。

2 个答案:

答案 0 :(得分:3)

如果升级到PostgreSQL 8.4或更高版本,则可以使用window functions (Oracle术语中的“分析函数”),例如rank()row_number()lead()lag()

SELECT tb.role_date AS role_date
        , tb.time_boundary AS start_time
        , LEAD (time_boundary) OVER www AS end_time
FROM role_times_boundaries tb
WINDOW www AS (PARTITION BY tb.role_date ORDER BY tb.time_boundary)
        ;

或前面查询的其他等价物:

SELECT tb.role_date AS role_date
        , tb.time_boundary AS start_time
        , LEAD (time_boundary) OVER ( PARTITION BY tb.role_date ORDER BY tb.time_boundary) AS end_time
FROM role_times_boundaries tb;

会给你以下结果集:

 role_date  | start_time | end_time 
------------+------------+----------
 2013-04-24 | 09:00:00   | 10:00:00
 2013-04-24 | 10:00:00   | 
 2013-04-25 | 07:00:00   | 08:50:00
 2013-04-25 | 08:50:00   | 09:00:00
 2013-04-25 | 09:00:00   | 12:00:00
 2013-04-25 | 12:00:00   | 13:00:00
 2013-04-25 | 13:00:00   | 16:00:00
 2013-04-25 | 16:00:00   | 17:30:00
 2013-04-25 | 17:30:00   | 20:00:00
 2013-04-25 | 20:00:00   | 
(10 rows)

要删除没有end_time的句点,可以将其包装到子查询中:

SELECT role_date , start_time , end_time
FROM (
        SELECT tb.role_date AS role_date
        , tb.time_boundary AS start_time
        , LEAD (time_boundary) OVER ( PARTITION BY tb.role_date ORDER BY tb.time_boundary) AS end_time
        FROM role_times_boundaries tb
        ) sq
WHERE sq.start_time <= sq.end_time;

然后会给你以下结果:

 role_date  | start_time | end_time 
------------+------------+----------
 2013-04-24 | 09:00:00   | 10:00:00
 2013-04-25 | 07:00:00   | 08:50:00
 2013-04-25 | 08:50:00   | 09:00:00
 2013-04-25 | 09:00:00   | 12:00:00
 2013-04-25 | 12:00:00   | 13:00:00
 2013-04-25 | 13:00:00   | 16:00:00
 2013-04-25 | 16:00:00   | 17:30:00
 2013-04-25 | 17:30:00   | 20:00:00
(8 rows)

更新:另一个替代查询,避免使用WINDOW函数,通过使用NOT EXISTS关键字解决问题:

SELECT lo.role_date
        , lo.time_boundary AS start_time
        , hi.time_boundary AS end_time
FROM role_times_boundaries lo
JOIN role_times_boundaries hi
    ON lo.role_date = hi.role_date
    AND lo.time_boundary < hi.time_boundary
    AND NOT EXISTS ( -- eliminate the men in the middle ...
        SELECT * FROM role_times_boundaries nx
        WHERE   nx.role_date = hi.role_date
        AND nx.time_boundary > lo.time_boundary
        AND nx.time_boundary < hi.time_boundary
        );

答案 1 :(得分:2)

解决方案

好的,首先让我们简化一下你的查询:

SELECT
  l.role_date,
  l.time_boundary AS start_time,
  r.time_boundary AS end_time
FROM role_times_boundaries l
INNER JOIN role_times_boundaries AS r ON ( -- You don't need that inner query, it's redundant
  l.role_date = r.role_date
  AND r.time_boundary = (
    SELECT MIN(r2.time_boundary)
    FROM role_times_boundaries AS r2
    WHERE r2.time_boundary > l.time_boundary))

现在问题是您要比较r2中的所有 time_boundarie,而不是角色日期限制的那些,因此corrected query将是:

SELECT
  l.role_date,
  l.time_boundary AS start_time,
  r.time_boundary AS end_time
FROM role_times_boundaries l
INNER JOIN role_times_boundaries AS r ON (
  l.role_date = r.role_date
  AND r.time_boundary = (
    SELECT MIN(r2.time_boundary)
    FROM role_times_boundaries AS r2
    -- Note the added restriction:
    WHERE r2.time_boundary > l.time_boundary and r2.role_date = l.role_date))

备用查询

following也适用于您的用例,可能更具可读性:

select
  l.role_date as role_date,
  l.time_boundary as start_time,
  min(r.time_boundary) as end_time
from role_times_boundaries l
join role_times_boundaries r on
  r.role_date = l.role_date
  and r.time_boundary > l.time_boundary
group by l.role_date, l.time_boundary
order by l.role_date, l.time_boundary