我正在寻找一种有效的方法来查找时间戳范围集之间的所有交叉点。它需要与PostgreSQL 9.2一起使用。
让我们说范围代表一个人可以见面的时间。每个人在可用时可能有一个或多个时间范围。我想找到所有会议开始的时间段(即所有人都可用的时间段)。
这是我到目前为止所得到的。它似乎有效,但我认为它非常有效,因为它一次只考虑一个人的可用性。
WITH RECURSIVE td AS
(
-- Test data. Returns:
-- ["2014-01-20 00:00:00","2014-01-31 00:00:00")
-- ["2014-02-01 00:00:00","2014-02-20 00:00:00")
-- ["2014-04-15 00:00:00","2014-04-20 00:00:00")
SELECT 1 AS entity_id, '2014-01-01'::timestamp AS begin_time, '2014-01-31'::timestamp AS end_time
UNION SELECT 1, '2014-02-01', '2014-02-28'
UNION SELECT 1, '2014-04-01', '2014-04-30'
UNION SELECT 2, '2014-01-15', '2014-02-20'
UNION SELECT 2, '2014-04-15', '2014-05-05'
UNION SELECT 3, '2014-01-20', '2014-04-20'
)
, ranges AS
(
-- Convert to tsrange type
SELECT entity_id, tsrange(begin_time, end_time) AS the_range
FROM td
)
, min_max AS
(
SELECT MIN(entity_id), MAX(entity_id)
FROM td
)
, inter AS
(
-- Ranges for the lowest ID
SELECT entity_id AS last_id, the_range
FROM ranges r
WHERE r.entity_id = (SELECT min FROM min_max)
UNION ALL
-- Iteratively intersect with ranges for the next higher ID
SELECT entity_id, r.the_range * i.the_range
FROM ranges r
JOIN inter i ON r.the_range && i.the_range
WHERE r.entity_id > i.last_id
AND NOT EXISTS
(
SELECT *
FROM ranges r2
WHERE r2.entity_id < r.entity_id AND r2.entity_id > i.last_id
)
)
-- Take the final set of intersections
SELECT *
FROM inter
WHERE last_id = (SELECT max FROM min_max)
ORDER BY the_range;
答案 0 :(得分:7)
我创建了tsrange_interception_agg
聚合
create function tsrange_interception (
internal_state tsrange, next_data_values tsrange
) returns tsrange as $$
select internal_state * next_data_values;
$$ language sql;
create aggregate tsrange_interception_agg (tsrange) (
sfunc = tsrange_interception,
stype = tsrange,
initcond = $$[-infinity, infinity]$$
);
然后这个查询
with td (id, begin_time, end_time) as
(
values
(1, '2014-01-01'::timestamp, '2014-01-31'::timestamp),
(1, '2014-02-01', '2014-02-28'),
(1, '2014-04-01', '2014-04-30'),
(2, '2014-01-15', '2014-02-20'),
(2, '2014-04-15', '2014-05-05'),
(3, '2014-01-20', '2014-04-20')
), ranges as (
select
id,
row_number() over(partition by id) as rn,
tsrange(begin_time, end_time) as tr
from td
), cr as (
select r0.tr tr0, r1.tr as tr1
from ranges r0 cross join ranges r1
where
r0.id < r1.id and
r0.tr && r1.tr and
r0.id = (select min(id) from td)
)
select tr0 * tsrange_interception_agg(tr1) as interseptions
from cr
group by tr0
having count(*) = (select count(distinct id) from td) - 1
;
interseptions
-----------------------------------------------
["2014-02-01 00:00:00","2014-02-20 00:00:00")
["2014-01-20 00:00:00","2014-01-31 00:00:00")
["2014-04-15 00:00:00","2014-04-20 00:00:00")
答案 1 :(得分:1)
如果您想要交叉引用固定数量的实体,则可以为每个实体使用交叉连接,并构建交集(在范围上使用*
运算符)。
但是,使用这样的交叉连接可能效率较低。以下示例更多地与解释下面更复杂的示例有关。
WITH td AS
(
SELECT 1 AS entity_id, '2014-01-01'::timestamp AS begin_time, '2014-01-31'::timestamp AS end_time
UNION SELECT 1, '2014-02-01', '2014-02-28'
UNION SELECT 1, '2014-04-01', '2014-04-30'
UNION SELECT 2, '2014-01-15', '2014-02-20'
UNION SELECT 2, '2014-04-15', '2014-05-05'
UNION SELECT 4, '2014-01-20', '2014-04-20'
)
,ranges AS
(
-- Convert to tsrange type
SELECT entity_id, tsrange(begin_time, end_time) AS the_range
FROM td
)
SELECT r1.the_range * r2.the_range * r3.the_range AS r
FROM ranges r1
CROSS JOIN ranges r2
CROSS JOIN ranges r3
WHERE r1.entity_id=1 AND r2.entity_id=2 AND r3.entity_id=4
AND NOT isempty(r1.the_range * r2.the_range * r3.the_range)
ORDER BY r
在这种情况下,多重交叉连接的效率可能较低,因为实际上并不需要拥有每个范围的所有可能组合,因为isempty(r1.the_range * r2.the_range)
足以使isempty(r1.the_range * r2.the_range * r3.the_range)
成立。
我不认为你可以避免每个人的可用性,因为你希望他们都能满足。
通过将每个人的可用性交叉连接到您使用另一个递归CTE计算的前一个子集(在下面的示例中为intersections
),可以帮助逐步构建交集的集合。然后,您可以逐步构建交叉点并消除空的范围,两个存储的数组:
WITH RECURSIVE td AS
(
SELECT 1 AS entity_id, '2014-01-01'::timestamp AS begin_time, '2014-01-31'::timestamp AS end_time
UNION SELECT 1, '2014-02-01', '2014-02-28'
UNION SELECT 1, '2014-04-01', '2014-04-30'
UNION SELECT 2, '2014-01-15', '2014-02-20'
UNION SELECT 2, '2014-04-15', '2014-05-05'
UNION SELECT 4, '2014-01-20', '2014-04-20'
)
,ranges AS
(
-- Convert to tsrange type
SELECT entity_id, tsrange(begin_time, end_time) AS the_range
FROM td
)
,ranges_arrays AS (
-- Prepare an array of all possible intervals per entity
SELECT entity_id, array_agg(the_range) AS ranges_arr
FROM ranges
GROUP BY entity_id
)
,numbered_ranges_arrays AS (
-- We'll join using pos+1 next, so we want continuous integers
-- I've changed the example entity_id from 3 to 4 to demonstrate this.
SELECT ROW_NUMBER() OVER () AS pos, entity_id, ranges_arr
FROM ranges_arrays
)
,intersections (pos, subranges) AS (
-- We start off with the infinite range.
SELECT 0::bigint, ARRAY['[,)'::tsrange]
UNION ALL
-- Then, we unnest the previous intermediate result,
-- cross join it against the array of ranges from the
-- next row in numbered_ranges_arrays (joined via pos+1).
-- We take the intersection and remove the empty array.
SELECT r.pos,
ARRAY(SELECT x * y FROM unnest(r.ranges_arr) x CROSS JOIN unnest(i.subranges) y WHERE NOT isempty(x * y))
FROM numbered_ranges_arrays r
INNER JOIN intersections i ON r.pos=i.pos+1
)
,last_intersections AS (
-- We just really want the result from the last operation (with the max pos).
SELECT subranges FROM intersections ORDER BY pos DESC LIMIT 1
)
SELECT unnest(subranges) r FROM last_intersections ORDER BY r
不幸的是,我不确定这是否可能表现更好。您可能需要更大的数据集来获得有意义的基准测试。
答案 2 :(得分:0)
好吧,我在TSQL中编写并测试了它,但它应该运行或者至少足够接近你才能翻译,它都是相当普通的构造。 除了可能之间,但可以分解成&lt;条款和a&gt;条款。(谢谢@Horse)
WITH cteSched AS ( --Schedule for everyone
-- Test data. Returns:
-- ["2014-01-20 00:00:00","2014-01-31 00:00:00")
-- ["2014-02-01 00:00:00","2014-02-20 00:00:00")
-- ["2014-04-15 00:00:00","2014-04-20 00:00:00")
SELECT 1 AS entity_id, '2014-01-01' AS begin_time, '2014-01-31' AS end_time
UNION SELECT 1, '2014-02-01', '2014-02-28'
UNION SELECT 1, '2014-04-01', '2014-04-30'
UNION SELECT 2, '2014-01-15', '2014-02-20'
UNION SELECT 2, '2014-04-15', '2014-05-05'
UNION SELECT 3, '2014-01-20', '2014-04-20'
), cteReq as ( --List of people to schedule (or is everyone in Sched required? Not clear, doesn't hurt)
SELECT 1 as entity_id UNION SELECT 2 UNION SELECT 3
), cteBegins as (
SELECT distinct begin_time FROM cteSched as T
WHERE NOT EXISTS (SELECT entity_id FROM cteReq as R
WHERE NOT EXISTS (SELECT * FROM cteSched as X
WHERE X.entity_id = R.entity_id
AND T.begin_time BETWEEN X.begin_time AND X.end_time ))
) SELECT B.begin_time, MIN(S.end_time ) as end_time
FROM cteBegins as B cross join cteSched as S
WHERE B.begin_time between S.begin_time and S.end_time
GROUP BY B.begin_time
-- NOTE: This assume users do not have schedules that overlap with themselves! That is, nothing like
-- John is available 2014-01-01 to 2014-01-15 and 2014-01-10 to 2014-01-20.
编辑:从上面添加输出(在SQL-Server 2008R2上执行时)
begin_time end_time
2014-01-20 2014-01-31
2014-02-01 2014-02-20
2014-04-15 2014-04-20