我有一个表格,其中包含每个开始日期和结束日期:
DROP TABLE temp_period;
CREATE TABLE public.temp_period
(
id integer NOT NULL,
"startDate" date,
"endDate" date
);
INSERT INTO temp_period(id,"startDate","endDate") VALUES(1,'2010-01-01','2010-03-31');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(2,'2013-05-17','2013-07-18');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(3,'2010-02-15','2010-05-31');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(7,'2014-01-01','2014-12-31');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(56,'2014-03-31','2014-06-30');
现在我想知道那里存储的所有时段的总持续时间。我只需要interval
的时间。这很简单:
SELECT sum(age("endDate","startDate")) FROM temp_period;
然而,问题是:那些时期确实重叠。我想消除所有重叠的时段,以便获得表中至少一条记录所涵盖的总时间。
你知道,时间之间存在相当大的差距,因此将最小的开始日期和最近的结束日期传递给age
函数不会有效。但是,我考虑过这样做并减去差距的总量,但没有优雅的方法可以做到这一点。
我使用PostgreSQL 9.6。
答案 0 :(得分:1)
这个怎么样:
WITH
/* get all time points where something changes */
points AS (
SELECT "startDate" AS p
FROM temp_period
UNION SELECT "endDate"
FROM temp_period
),
/*
* Get all date ranges between these time points.
* The first time range will start with NULL,
* but that will be excluded in the next CTE anyway.
*/
inter AS (
SELECT daterange(
lag(p) OVER (ORDER BY p),
p
) i
FROM points
),
/*
* Get all date ranges that are contained
* in at least one of the intervals.
*/
overlap AS (
SELECT DISTINCT i
FROM inter
CROSS JOIN temp_period
WHERE i <@ daterange("startDate", "endDate")
)
/* sum the lengths of the date ranges */
SELECT sum(age(upper(i), lower(i)))
FROM overlap;
对于您的数据,它将返回:
┌──────────┐
│ interval │
├──────────┤
│ 576 days │
└──────────┘
(1 row)
答案 1 :(得分:1)
您可以尝试使用递归cte来计算周期。对于每条记录,我们将检查它是否与之前的记录重叠。如果是,我们只计算不重叠的时期。
WITH RECURSIVE days_count AS
(
SELECT startDate,
endDate,
AGE(endDate, startDate) AS total_days,
rowSeq
FROM ordered_data
WHERE rowSeq = 1
UNION ALL
SELECT GREATEST(curr.startDate, prev.endDate) AS startDate,
GREATEST(curr.endDate, prev.endDate) AS endDate,
AGE(GREATEST(curr.endDate, prev.endDate), GREATEST(curr.startDate, prev.endDate)) AS total_days,
curr.rowSeq
FROM ordered_data curr
INNER JOIN days_count prev
ON curr.rowSeq > 1
AND curr.rowSeq = prev.rowSeq + 1),
ordered_data AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY startDate) AS rowSeq
FROM temp_period)
SELECT SUM(total_days) AS total_days
FROM days_count;
我创建了一个演示here
答案 2 :(得分:1)
实际上有一个案例没有被前面的例子所涵盖。 如果我们有这样一个时期怎么办?
INSERT INTO temp_period(id,"startDate","endDate") VALUES(100,'2010-01-03','2010-02-10');
我们有以下间隔:
Interval No. | | start_date | | end_date
--------------+------------------+------------+----------------+------------
1 | Interval start | 2010-01-01 | Interval end | 2010-03-31
2 | Interval start | 2010-01-03 | Interval end | 2010-02-10
3 | Interval start | 2010-02-15 | Interval end | 2010-05-31
4 | Interval start | 2013-05-17 | Interval end | 2013-07-18
5 | Interval start | 2014-01-01 | Interval end | 2014-12-31
6 | Interval start | 2014-03-31 | Interval end | 2014-06-30
即使第 3 段与第 1 段重叠,它也被视为一个新段,因此(错误的)结果:
sum
-----
620
(1 row)
解决方案是调整查询的核心
CASE WHEN start_date < lag(end_date) OVER (ORDER BY start_date, end_date) then NULL ELSE start_date END
需要替换为
CASE WHEN start_date < max(end_date) OVER (ORDER BY start_date, end_date rows between unbounded preceding and 1 preceding) then NULL ELSE start_date END
然后它按预期工作
sum
-----
576
(1 row)
总结:
SELECT sum(e - s)
FROM (
SELECT left_edge as s, max(end_date) as e
FROM (
SELECT start_date, end_date, max(new_start) over (ORDER BY start_date, end_date) as left_edge
FROM (
SELECT start_date, end_date, CASE WHEN start_date < max(end_date) OVER (ORDER BY start_date, end_date rows between unbounded preceding and 1 preceding) then NULL ELSE start_date END AS new_start
FROM temp_period
) s1
) s2
GROUP BY left_edge
) s3;
答案 3 :(得分:0)
这个在复杂查询中需要两个外连接。一个连接以识别具有大于THIS的startdate的所有重叠并且扩展时间跨度以匹配两者中的较大者。需要第二个连接来匹配没有重叠的记录。取最小值的最小值和最大值的最大值,包括非匹配值。我使用的是MSSQL,因此语法可能略有不同。
DECLARE @temp_period TABLE
(
id int NOT NULL,
startDate datetime,
endDate datetime
)
INSERT INTO @temp_period(id,startDate,endDate) VALUES(1,'2010-01-01','2010-03-31')
INSERT INTO @temp_period(id,startDate,endDate) VALUES(2,'2013-05-17','2013-07-18')
INSERT INTO @temp_period(id,startDate,endDate) VALUES(3,'2010-02-15','2010-05-31')
INSERT INTO @temp_period(id,startDate,endDate) VALUES(3,'2010-02-15','2010-07-31')
INSERT INTO @temp_period(id,startDate,endDate) VALUES(7,'2014-01-01','2014-12-31')
INSERT INTO @temp_period(id,startDate,endDate) VALUES(56,'2014-03-31','2014-06-30')
;WITH OverLaps AS
(
SELECT
Main.id,
OverlappedID=Overlaps.id,
OverlapMinDate,
OverlapMaxDate
FROM
@temp_period Main
LEFT OUTER JOIN
(
SELECT
This.id,
OverlapMinDate=CASE WHEN This.StartDate<Prior.StartDate THEN This.StartDate ELSE Prior.StartDate END,
OverlapMaxDate=CASE WHEN This.EndDate>Prior.EndDate THEN This.EndDate ELSE Prior.EndDate END,
PriorID=Prior.id
FROM
@temp_period This
LEFT OUTER JOIN @temp_period Prior ON Prior.endDate > This.startDate AND Prior.startdate < this.endDate AND This.Id<>Prior.ID
) Overlaps ON Main.Id=Overlaps.PriorId
)
SELECT
T.Id,
--If has overlapped then sum all overlapped records prior to this one, else not and overlap get the start and end
MinDate=MIN(COALESCE(HasOverlapped.OverlapMinDate,startDate)),
MaxDate=MAX(COALESCE(HasOverlapped.OverlapMaxDate,endDate))
FROM
@temp_period T
LEFT OUTER JOIN OverLaps IsAOverlap ON IsAOverlap.OverlappedID=T.id
LEFT OUTER JOIN OverLaps HasOverlapped ON HasOverlapped.Id=T.id
WHERE
IsAOverlap.OverlappedID IS NULL -- Exclude older records that have overlaps
GROUP BY
T.Id
答案 4 :(得分:0)
注意:Laurenz Albe 的回答存在巨大的可扩展性问题。
当我找到它时,我非常高兴。我根据我们的需要定制了它。我们部署到暂存区,很快,服务器花了几分钟才返回结果。
然后我在 postgresql.org 上找到了这个答案。效率更高。 https://wiki.postgresql.org/wiki/Range_aggregation
SELECT sum(e - s)
FROM (
SELECT left_edge as s, max(end_date) as e
FROM (
SELECT start_date, end_date, max(new_start) over (ORDER BY start_date, end_date) as left_edge
FROM (
SELECT start_date, end_date, CASE WHEN start_date < lag(end_date) OVER (ORDER BY start_date, end_date) then NULL ELSE start_date END AS new_start
FROM temp_period
) s1
) s2
GROUP BY left_edge
) s3;
结果:
sum
-----
576
(1 row)