我有很多关于给定ID的开始和停止时间的数据,我需要将所有相交和相邻的时间跨度展平为一个组合的时间跨度。下面发布的样本数据都是相同的ID,所以我没有列出它。
为了使事情更清楚,请查看03.06.2009的示例数据:
以下时间跨度是重叠或连续的,需要合并为一个时间范围
结果时间跨度为05:54:48至10:00:13。由于在10:00:13和10:12:50之间存在差距,我们还有以下时间间隔:
导致从10:12:50到14:02:31的一个合并时间跨度,因为它们重叠或相邻。
下面您将找到我需要的样本数据和展平数据。持续时间列只是提供信息。
任何解决方案 - 无论是SQL还是不 - 非常感谢。
编辑:由于有许多不同且有趣的解决方案,我通过添加约束来查看“最佳”(如果有的话)解决方案泡沫,从而完善我的原始问题:
在这些限制条件下,最佳解决方案是什么?我担心大多数解决方案都会非常缓慢,因为他们加入了日期和时间的组合,这在我的案例中不是索引字段。
您是否会在客户端或服务器端进行所有合并?您是否首先创建一个优化的临时表并使用该表提出的解决方案之一?到目前为止,我没有时间测试解决方案,但我会告诉您最适合我的方法。
示例数据:
Date | Start | Stop
-----------+----------+---------
02.06.2009 | 05:55:28 | 09:58:27
02.06.2009 | 10:15:19 | 13:58:24
02.06.2009 | 13:58:24 | 13:58:43
03.06.2009 | 05:54:48 | 10:00:13
03.06.2009 | 09:26:45 | 09:59:40
03.06.2009 | 10:12:50 | 10:27:25
03.06.2009 | 10:13:12 | 11:14:56
03.06.2009 | 10:27:25 | 10:27:31
03.06.2009 | 10:27:39 | 13:53:38
03.06.2009 | 11:14:56 | 11:15:03
03.06.2009 | 11:15:30 | 14:02:14
03.06.2009 | 13:53:38 | 13:53:43
03.06.2009 | 14:02:14 | 14:02:31
04.06.2009 | 05:48:27 | 09:58:59
04.06.2009 | 06:00:00 | 09:59:07
04.06.2009 | 10:15:52 | 13:54:52
04.06.2009 | 10:16:01 | 13:24:20
04.06.2009 | 13:24:20 | 13:24:24
04.06.2009 | 13:24:32 | 14:00:39
04.06.2009 | 13:54:52 | 13:54:58
04.06.2009 | 14:00:39 | 14:00:49
05.06.2009 | 05:53:58 | 09:59:12
05.06.2009 | 10:16:05 | 13:59:08
05.06.2009 | 13:59:08 | 13:59:16
06.06.2009 | 06:04:00 | 10:00:00
06.06.2009 | 10:16:54 | 10:18:40
06.06.2009 | 10:18:40 | 10:18:45
06.06.2009 | 10:23:00 | 13:57:00
06.06.2009 | 10:23:48 | 13:57:54
06.06.2009 | 13:57:21 | 13:57:38
06.06.2009 | 13:57:54 | 13:57:58
07.06.2009 | 21:59:30 | 01:58:49
07.06.2009 | 22:12:16 | 01:58:39
07.06.2009 | 22:12:25 | 01:58:28
08.06.2009 | 02:10:33 | 05:56:11
08.06.2009 | 02:10:43 | 05:56:23
08.06.2009 | 02:10:49 | 05:55:59
08.06.2009 | 05:55:59 | 05:56:01
08.06.2009 | 05:56:11 | 05:56:14
08.06.2009 | 05:56:23 | 05:56:27
展平结果:
Date | Start | Stop | Duration
-----------+----------+----------+---------
02.06.2009 | 05:55:28 | 09:58:27 | 04:02:59
02.06.2009 | 10:15:19 | 13:58:43 | 03:43:24
03.06.2009 | 05:54:48 | 10:00:13 | 04:05:25
03.06.2009 | 10:12:50 | 14:02:31 | 03:49:41
04.06.2009 | 05:48:27 | 09:59:07 | 04:10:40
04.06.2009 | 10:15:52 | 14:00:49 | 03:44:58
05.06.2009 | 05:53:58 | 09:59:12 | 04:05:14
05.06.2009 | 10:16:05 | 13:59:16 | 03:43:11
06.06.2009 | 06:04:00 | 10:00:00 | 03:56:00
06.06.2009 | 10:16:54 | 10:18:45 | 00:01:51
06.06.2009 | 10:23:00 | 13:57:58 | 03:34:58
07.06.2009 | 21:59:30 | 01:58:49 | 03:59:19
08.06.2009 | 02:10:33 | 05:56:27 | 03:45:54
答案 0 :(得分:7)
这是一个仅限SQL的解决方案。我使用DATETIME作为列。在我看来,将时间分开存储是个错误,因为当时间超过午夜时你会遇到问题。如果需要,您可以调整此值来处理这种情况。该解决方案还假定开始和结束时间不是NULL。同样,如果情况并非如此,您可以根据需要进行调整。
解决方案的一般要点是获得不与任何其他跨度重叠的所有开始时间,获得不与任何跨度重叠的所有结束时间,然后将两者匹配在一起。
结果符合您的预期结果,但在一种情况下,手动检查看起来您的预期输出有误。在6日应该有一个跨度在2009-06-06 10:18:45.000结束。
SELECT
ST.start_time,
ET.end_time
FROM
(
SELECT
T1.start_time
FROM
dbo.Test_Time_Spans T1
LEFT OUTER JOIN dbo.Test_Time_Spans T2 ON
T2.start_time < T1.start_time AND
T2.end_time >= T1.start_time
WHERE
T2.start_time IS NULL
) AS ST
INNER JOIN
(
SELECT
T3.end_time
FROM
dbo.Test_Time_Spans T3
LEFT OUTER JOIN dbo.Test_Time_Spans T4 ON
T4.end_time > T3.end_time AND
T4.start_time <= T3.end_time
WHERE
T4.start_time IS NULL
) AS ET ON
ET.end_time > ST.start_time
LEFT OUTER JOIN
(
SELECT
T5.end_time
FROM
dbo.Test_Time_Spans T5
LEFT OUTER JOIN dbo.Test_Time_Spans T6 ON
T6.end_time > T5.end_time AND
T6.start_time <= T5.end_time
WHERE
T6.start_time IS NULL
) AS ET2 ON
ET2.end_time > ST.start_time AND
ET2.end_time < ET.end_time
WHERE
ET2.end_time IS NULL
答案 1 :(得分:4)
在MySQL
:
SELECT grouper, MIN(start) AS group_start, MAX(end) AS group_end
FROM (
SELECT start,
end,
@r := @r + (@edate < start) AS grouper,
@edate := GREATEST(end, CAST(@edate AS DATETIME))
FROM (
SELECT @r := 0,
@edate := CAST('0000-01-01' AS DATETIME)
) vars,
(
SELECT rn_date + INTERVAL TIME_TO_SEC(rn_start) SECOND AS start,
rn_date + INTERVAL TIME_TO_SEC(rn_end) SECOND + INTERVAL (rn_start > rn_end) DAY AS end
FROM t_ranges
) q
ORDER BY
start
) q
GROUP BY
grouper
ORDER BY
group_start
SQL Server
的相同决定在我的博客中的以下文章中进行了描述:
这是执行此操作的功能:
DROP FUNCTION fn_spans
GO
CREATE FUNCTION fn_spans(@p_from DATETIME, @p_till DATETIME)
RETURNS @t TABLE
(
q_start DATETIME NOT NULL,
q_end DATETIME NOT NULL
)
AS
BEGIN
DECLARE @qs DATETIME
DECLARE @qe DATETIME
DECLARE @ms DATETIME
DECLARE @me DATETIME
DECLARE cr_span CURSOR FAST_FORWARD
FOR
SELECT s_date + s_start AS q_start,
s_date + s_stop + CASE WHEN s_start < s_stop THEN 0 ELSE 1 END AS q_end
FROM t_span
WHERE s_date BETWEEN @p_from - 1 AND @p_till
AND s_date + s_start >= @p_from
AND s_date + s_stop <= @p_till
ORDER BY
q_start
OPEN cr_span
FETCH NEXT
FROM cr_span
INTO @qs, @qe
SET @ms = @qs
SET @me = @qe
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH NEXT
FROM cr_span
INTO @qs, @qe
IF @qs > @me
BEGIN
INSERT
INTO @t
VALUES (@ms, @me)
SET @ms = @qs
END
SET @me = CASE WHEN @qe > @me THEN @qe ELSE @me END
END
IF @ms IS NOT NULL
BEGIN
INSERT
INTO @t
VALUES (@ms, @me)
END
CLOSE cr_span
RETURN
END
由于SQL Server
缺少一种简单的方法来引用结果集中以前选择的行,因此SQL Server
中的游标比基于集合的决策工作得更快的情况很少见。
在1,440,000
行上进行测试,对于完整设置适用于24
秒,并且在一两天的范围内几乎是即时的。
请注意SELECT
查询中的附加条件:
s_date BETWEEN @p_from - 1 AND @p_till
这似乎是多余的,但它实际上是一个粗略的过滤器,可以使s_date
上的索引可用。
答案 2 :(得分:3)
关于SO的类似问题:
Min effective and termdate for contiguous dates
FWIW我投票推荐了Joe Celko的SQL For Smarties,第三版 - 重复:第三版(2005) - 讨论了各种方法,设定了基础和程序。
答案 3 :(得分:2)
假设你:
执行以下操作:
first = first row in L
flat_date.start = first.start, flat_date.end = first.end
For each row in L:
if row.start < flat_date.end and row.end > flat_date.end: // adding on to a timespan
flat_date.end = row.end
else: // ending a timespan and starting a new one
add flat_date to F
flat_date.start = row.start, flat_date.end = row.end
add flat_date to F // adding the last timespan to the flattened list
答案 4 :(得分:1)
这是一个递归的CTE解决方案,但我冒昧地为每列分配日期和时间,而不是单独拉出日期。有助于避免一些凌乱的特殊情况代码。如果你必须单独存储日期,我会使用CTE视图使它看起来像两个日期时间列并采用这种方法。
创建测试数据:
create table t1 (d1 datetime, d2 datetime)
insert t1 (d1,d2)
select '2009-06-03 10:00:00', '2009-06-03 14:00:00'
union all select '2009-06-03 13:55:00', '2009-06-03 18:00:00'
union all select '2009-06-03 17:55:00', '2009-06-03 23:00:00'
union all select '2009-06-03 22:55:00', '2009-06-04 03:00:00'
union all select '2009-06-04 03:05:00', '2009-06-04 07:00:00'
union all select '2009-06-04 07:05:00', '2009-06-04 10:00:00'
union all select '2009-06-04 09:55:00', '2009-06-04 14:00:00'
递归CTE:
;with dateRanges (ancestorD1, parentD1, d2, iter) as
(
--anchor is first level of collapse
select
d1 as ancestorD1,
d1 as parentD1,
d2,
cast(0 as int) as iter
from t1
--recurse as long as there is another range to fold in
union all select
tLeft.ancestorD1,
tRight.d1 as parentD1,
tRight.d2,
iter + 1 as iter
from dateRanges as tLeft join t1 as tRight
--join condition is that the t1 row can be consumed by the recursive row
on tLeft.d2 between tRight.d1 and tRight.d2
--exclude identical rows
and not (tLeft.parentD1 = tRight.d1 and tLeft.d2 = tRight.d2)
)
select
ranges1.*
from dateRanges as ranges1
where not exists (
select 1
from dateRanges as ranges2
where ranges1.ancestorD1 between ranges2.ancestorD1 and ranges2.d2
and ranges1.d2 between ranges2.ancestorD1 and ranges2.d2
and ranges2.iter > ranges1.iter
)
提供输出:
ancestorD1 parentD1 d2 iter
----------------------- ----------------------- ----------------------- -----------
2009-06-04 03:05:00.000 2009-06-04 03:05:00.000 2009-06-04 07:00:00.000 0
2009-06-04 07:05:00.000 2009-06-04 09:55:00.000 2009-06-04 14:00:00.000 1
2009-06-03 10:00:00.000 2009-06-03 22:55:00.000 2009-06-04 03:00:00.000 3
答案 5 :(得分:0)
为了帮助回答这个问题,以下是Hainstech使用的表变量中问题中给出的样本数据:
declare @T1 table (d1 datetime, d2 datetime)
insert @T1 (d1,d2)
select '02 June 2009 05:55:28','02 June 2009 09:58:27'
union all select '02 June 2009 10:15:19','02 June 2009 13:58:24'
union all select '02 June 2009 13:58:24','02 June 2009 13:58:43'
union all select '03 June 2009 05:54:48','03 June 2009 10:00:13'
union all select '03 June 2009 09:26:45','03 June 2009 09:59:40'
union all select '03 June 2009 10:12:50','03 June 2009 10:27:25'
union all select '03 June 2009 10:13:12','03 June 2009 11:14:56'
union all select '03 June 2009 10:27:25','03 June 2009 10:27:31'
union all select '03 June 2009 10:27:39','03 June 2009 13:53:38'
union all select '03 June 2009 11:14:56','03 June 2009 11:15:03'
union all select '03 June 2009 11:15:30','03 June 2009 14:02:14'
union all select '03 June 2009 13:53:38','03 June 2009 13:53:43'
union all select '03 June 2009 14:02:14','03 June 2009 14:02:31'
union all select '04 June 2009 05:48:27','04 June 2009 09:58:59'
union all select '04 June 2009 06:00:00','04 June 2009 09:59:07'
union all select '04 June 2009 10:15:52','04 June 2009 13:54:52'
union all select '04 June 2009 10:16:01','04 June 2009 13:24:20'
union all select '04 June 2009 13:24:20','04 June 2009 13:24:24'
union all select '04 June 2009 13:24:32','04 June 2009 14:00:39'
union all select '04 June 2009 13:54:52','04 June 2009 13:54:58'
union all select '04 June 2009 14:00:39','04 June 2009 14:00:49'
union all select '05 June 2009 05:53:58','05 June 2009 09:59:12'
union all select '05 June 2009 10:16:05','05 June 2009 13:59:08'
union all select '05 June 2009 13:59:08','05 June 2009 13:59:16'
union all select '06 June 2009 06:04:00','06 June 2009 10:00:00'
union all select '06 June 2009 10:16:54','06 June 2009 10:18:40'
union all select '06 June 2009 10:18:40','06 June 2009 10:18:45'
union all select '06 June 2009 10:23:00','06 June 2009 13:57:00'
union all select '06 June 2009 10:23:48','06 June 2009 13:57:54'
union all select '06 June 2009 13:57:21','06 June 2009 13:57:38'
union all select '06 June 2009 13:57:54','06 June 2009 13:57:58'
union all select '07 June 2009 21:59:30','07 June 2009 01:58:49'
union all select '07 June 2009 22:12:16','07 June 2009 01:58:39'
union all select '07 June 2009 22:12:25','07 June 2009 01:58:28'
union all select '08 June 2009 02:10:33','08 June 2009 05:56:11'
union all select '08 June 2009 02:10:43','08 June 2009 05:56:23'
union all select '08 June 2009 02:10:49','08 June 2009 05:55:59'
union all select '08 June 2009 05:55:59','08 June 2009 05:56:01'
union all select '08 June 2009 05:56:11','08 June 2009 05:56:14'
union all select '08 June 2009 05:56:23','08 June 2009 05:56:27'
答案 6 :(得分:0)
扩展MahlerFive的答案我写了一个快速扩展到DateTools。到目前为止,它已通过我的所有测试。
extension DTTimePeriodCollection {
func flatten() {
self.sortByStartAscending()
guard let periods = self.periods() else { return }
if periods.count < 1 { return }
var flattenedPeriods = [DTTimePeriod]()
let flatdate = DTTimePeriod()
for period in periods {
guard let periodStart = period.StartDate, let periodEnd = period.EndDate else { continue }
if !flatdate.hasStartDate() { flatdate.StartDate = periodStart }
if !flatdate.hasEndDate() { flatdate.EndDate = periodEnd }
if periodStart.isEarlierThanOrEqualTo(flatdate.EndDate) && periodEnd.isGreaterThanOrEqualTo(flatdate.EndDate) {
flatdate.EndDate = periodEnd
} else {
flattenedPeriods.append(flatdate.copy())
flatdate.StartDate = periodStart
flatdate.EndDate = periodEnd
}
}
flattenedPeriods.append(flatdate.copy())
// delete all periods
for var i = 0 ; i < periods.count ; i++ { self.removeTimePeriodAtIndex(0) }
// add flattened periods to self
for flat in flattenedPeriods { self.addTimePeriod(flat) }
}