PostgreSQL SELECT必须匹配多行

时间:2013-03-21 01:28:26

标签: sql postgresql select postgresql-9.1

我有一个表,其中包含一对代表时间跨度的时间戳。这些行的范围是用户标识,每个用户可以有一行或多行与之关联。

此数据是从抽象的“可用性”表单生成的,表示用户在一周内可用的时间。我需要输入一系列时间范围作为查询,并返回表中匹配所有行的所有用户ID。

鉴于此表:

CREATE TABLE "public"."availability" (
  "id" int4 NOT NULL,
  "user_id" int4,
  "starts_at" timestamp(6),
  "ends_at" timestamp(6),
  PRIMARY KEY ("id")
) WITH (OIDS=FALSE)

和这个数据:

User #1 is available Mon-Tue between 08:00 and 17:00

+----+---------+---------------------+---------------------+
| id | user_id | starts_at           | ends_at             |
+----+---------+---------------------+---------------------+
| 1  | 1       | 2013-03-18 08:00:00 | 2013-03-18 17:00:00 |
+----+---------+---------------------+---------------------+
| 2  | 1       | 2013-03-19 08:00:00 | 2013-03-19 17:00:00 |
+----+---------+---------------------+---------------------+

User #2 is available Sun-Sat all day

+----+---------+---------------------+---------------------+
| 3  | 2       | 2013-03-17 00:00:00 | 2013-03-23 23:59:59 |
+----+---------+---------------------+---------------------+

User #3 is available Wed between 06:00 and 18:00

+----+---------+---------------------+---------------------+
| 4  | 3       | 2013-03-20 06:00:00 | 2013-03-20 18:00:00 |
+----+---------+---------------------+---------------------+

我可以轻松选择可用于任何给定时间戳的用户:

SELECT * FROM "public"."availability"
  WHERE ('2013-03-19 08:35:00' BETWEEN starts_at AND ends_at 
     AND '2013-03-19 18:25:00' BETWEEN starts_at AND ends_at)
    OR  ('2013-03-20 12:00:00' BETWEEN starts_at AND ends_at
     AND '2013-03-20 18:00:00' BETWEEN starts_at AND ends_at);

+----+---------+---------------------+---------------------+
| id | user_id | starts_at           | ends_at             |
+----+---------+---------------------+---------------------+
| 3  | 2       | 2013-03-17 00:00:00 | 2013-03-23 23:59:59 |
+----+---------+---------------------+---------------------+
| 4  | 3       | 2013-03-20 06:00:00 | 2013-03-20 18:00:00 |
+----+---------+---------------------+---------------------+

但我真正需要的是能够查询多个时间跨度并仅返回匹配所有条件的user_id

查询:2013-03-17 10:00:00 - 2013-03-17 16:00:002013-03-23 10:00:00 - 2013-03-23 16:00:00应该返回:

+----+---------+---------------------+---------------------+
| id | user_id | starts_at           | ends_at             |
+----+---------+---------------------+---------------------+
| 3  | 2       | 2013-03-17 00:00:00 | 2013-03-23 23:59:59 |
+----+---------+---------------------+---------------------+

查询:2013-03-18 09:00:00 - 2013-03-18 16:00:002013-03-19 08:00:00 - 2013-03-19 15:45:00应该返回:

+----+---------+---------------------+---------------------+
| id | user_id | starts_at           | ends_at             |
+----+---------+---------------------+---------------------+
| 1  | 1       | 2013-03-18 08:00:00 | 2013-03-18 17:00:00 |
+----+---------+---------------------+---------------------+
| 2  | 1       | 2013-03-19 08:00:00 | 2013-03-19 17:00:00 |
+----+---------+---------------------+---------------------+
| 3  | 2       | 2013-03-17 00:00:00 | 2013-03-23 23:59:59 |
+----+---------+---------------------+---------------------+

查询:2013-03-18 07:00:00 - 2013-03-18 18:00:00应该不返回任何内容。

SQLFiddle example

2 个答案:

答案 0 :(得分:4)

对于这样的应用程序,如果您使用的是PostgreSQL 9.2或更高版本,则可能需要尝试range type。以下是创建,加载和显示数据的示例:

CREATE TABLE availability (
  id      int4 NOT NULL,
  user_id int4,
  avail   tstzrange,
  PRIMARY KEY (id)
);
INSERT INTO availability VALUES
  (1, 1, '[2013-03-18 08:00:00, 2013-03-18 17:00:00)'),
  (2, 1, '[2013-03-19 08:00:00, 2013-03-19 17:00:00)'),
  (3, 2, '[2013-03-17 00:00:00, 2013-03-23 24:00:00)'),
  (4, 3, '[2013-03-20 06:00:00, 2013-03-20 18:00:00)');
SELECT * FROM availability ;
 id | user_id |                        avail                        
----+---------+-----------------------------------------------------
  1 |       1 | ["2013-03-18 08:00:00-05","2013-03-18 17:00:00-05")
  2 |       1 | ["2013-03-19 08:00:00-05","2013-03-19 17:00:00-05")
  3 |       2 | ["2013-03-17 00:00:00-05","2013-03-24 00:00:00-05")
  4 |       3 | ["2013-03-20 06:00:00-05","2013-03-20 18:00:00-05")
(4 rows)

然后您可以使用各种运算符进行查询。如果您希望包含指定查询范围的任何的所有可用性范围:

SELECT * FROM availability
  WHERE avail @> '[2013-03-19 08:35:00, 2013-03-19 18:25:00)'
     OR avail @> '[2013-03-20 12:00:00, 2013-03-20 18:00:00)';

或:

SELECT * FROM availability
  WHERE avail @> ANY
          (ARRAY ['[2013-03-19 08:35:00, 2013-03-19 18:25:00)'::tstzrange,
                  '[2013-03-20 12:00:00, 2013-03-20 18:00:00)'::tstzrange]);
 id | user_id |                        avail                        
----+---------+-----------------------------------------------------
  3 |       2 | ["2013-03-17 00:00:00-05","2013-03-24 00:00:00-05")
  4 |       3 | ["2013-03-20 06:00:00-05","2013-03-20 18:00:00-05")
(2 rows)

如果您希望所有可用范围包含指定查询范围所有在一个范围内

SELECT * FROM availability
  WHERE avail @> '[2013-03-17 10:00:00, 2013-03-17 16:00:00)'
    AND avail @> '[2013-03-23 10:00:00, 2013-03-23 16:00:00)';

或:

SELECT * FROM availability
  WHERE avail @> ALL
          (ARRAY ['[2013-03-17 10:00:00, 2013-03-17 16:00:00)'::tstzrange,
                  '[2013-03-23 10:00:00, 2013-03-23 16:00:00)'::tstzrange]);
 id | user_id |                        avail                        
----+---------+-----------------------------------------------------
  3 |       2 | ["2013-03-17 00:00:00-05","2013-03-24 00:00:00-05")
(1 row)

如果您希望所有可用性范围包含指定查询范围的任何,但仅适用于可用范围涵盖指定查询范围所有的用户

WITH s(ts) AS
(
  VALUES
    ('[2013-03-18 09:00:00, 2013-03-18 16:00:00)'::tstzrange),
    ('[2013-03-19 08:00:00, 2013-03-19 15:45:00)'::tstzrange)
)
SELECT DISTINCT a1.*
  FROM s s1
  JOIN availability a1 ON a1.avail @> s1.ts
    AND NOT EXISTS
        (
          SELECT * FROM s s2
            WHERE NOT EXISTS
                  (
                    SELECT * FROM availability a2
                      WHERE a2.user_id = a1.user_id
                        AND a2.avail @> s2.ts
                  )
        );

或(调整Clodoaldo Neto的查询以使用范围):

SELECT a.*
  FROM availability a
  JOIN (
         SELECT
             user_id,
             sum(('[2013-03-18 09:00:00, 2013-03-18 16:00:00)'::tstzrange
                   <@ avail)::integer
                 +
                 ('[2013-03-19 08:00:00, 2013-03-19 15:45:00)'::tstzrange
                   <@ avail)::integer
                ) period
           FROM availability
           GROUP BY user_id
       ) s ON a.user_id = s.user_id
  WHERE period >= 2;

您可以创建一个索引,以便在大型表格上快速进行此类搜索:

CREATE INDEX availability_avail ON availability USING gist (avail);

注意:

  • 为了便于阅读,我省略了架构和引用。
  • 索引不太可能与四行一起使用,因为通过直接读取一个数据页面可以更快地获得所有数据。使用大型桌子可能会产生很大的不同。
  • 我使用TIMESTAMP WITH TIME ZONE的范围,因为默认(裸)TIMESTAMP时钟在夏令时结束时每年向后移动。要及时捕捉时刻,请使用TIMESTAMP WITH TIME ZONE(简称timestamptz)。
  • 直接使用时,不需要显式地转换文字;使用查询的ANYALL形式时,需要显式强制转换。
  • 范围上的方括号表示范围包括相邻时间,而圆括号表示范围排除相邻时间。时间戳通常使用[)指定,以便以给定时间结束的范围和以相同时间开始的另一个范围被视为相邻而不是重叠
  • '24:00:00'代表一个日期,'00:00:00'代表下一个日期。{/ li>
  • 前面两点可以更容易地指定在午夜结束的时间戳。没有“失去第二次”或其他陌生感的风险。

答案 1 :(得分:3)

SQL Fiddle

这会将boolean强制转换为整数0或1。

select a.*
from
    availability a
    inner join
    (
        select
            user_id,
            sum (
                ('2013-03-18 09:00:00' between starts_at and ends_at
                 and
                 '2013-03-18 16:00:00' between starts_at and ends_at
                )::integer
                +
                ('2013-03-19 08:00:00' between starts_at and ends_at
                 and
                 '2013-03-19 15:45:00' between starts_at and ends_at
                )::integer
            ) period
        from availability
        group by user_id
    ) s on a.user_id = s.user_id
where period >= 2

where条件更改为要匹配的句点数。