我有一张这样的桌子
ColumnId Intime Outtime
1 01/02/2009 10.00.000 01/02/2009 20.00.0000
2 01/02/2009 2.00.000 01/02/2009 2.00.0000
3 01/02/2009 2.00.000 01/02/2009 5.00.0000
4 01/02/2009 3.3.0.000 01/02/2009 5.00.0000
5 01/02/2009 10.00.000 01/02/2009 22.00.0000
6 01/02/2009 3.00.000 01/02/2009 4.00.0000
我有这样的列和值。我想找到重叠的记录以及特定日期的重叠记录数。从一天1-24开始重叠。
注意: - 我的表有数百万条记录。
例如,在第一个值中登录10并注销20.在5中,记录登录在10并且在22记录,因此第5个与第一个重叠。表中没有指数。
请告诉我查询的答案。
我需要在SQL Server 2005中执行查询
答案 0 :(得分:8)
从我的头脑中,并假设两列的索引,你可以使用这样的东西:
SELECT a.ColumnId
,a.InTime
,a.OutTime
,b.ColumnId AS OverlappingId
,b.InTime AS OverlappingInTime
,b.OutTime AS OverlappingOutTime
FROM TimeTable AS a
JOIN TimeTable AS b ON ((a.InTime BETWEEN b.InTime AND b.OutTime)
OR (a.OutTime BETWEEN b.InTime AND b.OutTime)
OR (a.InTime < b.InTime AND a.OutIme > b.OutTime))
AND (a.ColumnId != b.ColumnId)
但是我真的不确定这个查询在你提到的包含数百万条记录的表格中的表现。
编辑添加,然后重新编辑:
在Vadim K.的评论之后,我注意到我之前写过的查询错过了重叠是完全的情况,即一个范围覆盖了另一个范围。以上是我修改后的查询,低于原始查询:
SELECT a.ColumnId
,a.InTime
,a.OutTime
,b.ColumnId AS OverlappingId
,b.InTime AS OverlappingInTime
,b.OutTime AS OverlappingOutTime
FROM TimeTable AS a
JOIN TimeTable AS b ON ((a.InTime BETWEEN b.InTime AND b.OutTime)
OR (a.OutTime BETWEEN b.InTime AND b.OutTime))
AND (a.ColumnId != b.ColumnId)
使用问题初始数据进行测试运行:
+--------+------------------+------------------+
|ColumnId| InTime | OutTime |
+--------+------------------+------------------+
| 1 | 01/02/2009 10:00 | 01/02/2009 20:00 |
| 2 | 01/02/2009 2:00 | 01/02/2009 2:00 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 |
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 |
| 5 | 01/02/2009 10:00 | 01/02/2009 22:00 |
| 6 | 01/02/2009 3:00 | 01/02/2009 4:00 |
+--------+------------------+------------------+
运行原始查询,我们得到以下结果:
+--------+------------------+------------------+-------------+
|ColumnId| InTime | OutTime |OverlappingId|
+--------+------------------+------------------+-------------+
| 1 | 01/02/2009 10:00 | 01/02/2009 20:00 | 5 |
| 2 | 01/02/2009 2:00 | 01/02/2009 2:00 | 3 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 2 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 4 |
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 | 3 |
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 | 6 |
| 5 | 01/02/2009 10:00 | 01/02/2009 22:00 | 1 |
| 6 | 01/02/2009 3:00 | 01/02/2009 4:00 | 3 |
| 6 | 01/02/2009 3:00 | 01/02/2009 4:00 | 4 |
+--------+------------------+------------------+-------------+
运行更新的查询,我们得到以下结果:
+--------+------------------+------------------+-------------+
|ColumnId| InTime | OutTime |OverlappingId|
+--------+------------------+------------------+-------------+
| 1 | 01/02/2009 10:00 | 01/02/2009 20:00 | 5 |
| 2 | 01/02/2009 2:00 | 01/02/2009 2:00 | 3 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 2 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 4 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 6 | << missing row
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 | 3 |
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 | 6 |
| 5 | 01/02/2009 10:00 | 01/02/2009 22:00 | 1 |
| 6 | 01/02/2009 3:00 | 01/02/2009 4:00 | 3 |
| 6 | 01/02/2009 3:00 | 01/02/2009 4:00 | 4 |
+--------+------------------+------------------+-------------+
是的,有些ID会重复,但那是因为它们与不同的记录重叠。
该问题还要求重叠行数。我不确定,问题还不够明确,如果它想要原始表格的重叠行数。
有些人建议使用a.ColumnId < b.ColumnId
或a.ColumnId > b.ColumnId
以避免重复,但是,它仍然不起作用,因为如果我们进行第一次比较,我们会得到以下结果:< / p>
+--------+------------------+------------------+-------------+
|ColumnId| InTime | OutTime |OverlappingId|
+--------+------------------+------------------+-------------+
| 1 | 01/02/2009 10:00 | 01/02/2009 20:00 | 5 |
| 2 | 01/02/2009 2:00 | 01/02/2009 2:00 | 3 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 4 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 6 |
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 | 6 |
+--------+------------------+------------------+-------------+
如果您注意到结果中引用了所有6行样本数据,尽管它只有5行。我相信,对于这些数据,所有行在某一点或另一点彼此重叠,重叠行的数量为6。
为了得到这个结果,下面的查询可以使用:
SELECT COUNT (DISTINCT a.ColumnId)
FROM TimeTable AS a
JOIN TimeTable AS b ON ((a.InTime BETWEEN b.InTime AND b.OutTime)
OR (a.OutTime BETWEEN b.InTime AND b.OutTime)
OR (a.InTime < b.InTime AND a.OutIme > b.OutTime))
AND (a.ColumnId != b.ColumnId)
返回所有6行的计数。
答案 1 :(得分:5)
仔细测试解决方案,我发现到目前为止发布的答案要么重叠检查错误,要么返回太多结果(每次重叠两行)。
select
aa.ColumnId as ColumnIdA, aa.InTime as InTimeA, aa.OutTime as OutTimeA,
bb.ColumnId as ColumnIdB, bb.InTime as InTimeB, bb.OutTime as OutTimeB
from
MyTable aa
join
MyTable bb on aa.ColumnId < bb.ColumnId
where
aa.InTime < bb.OutTime
and
aa.OutTime > bb.InTime
在定义“重叠”时必须小心。我假设如果第一个时段是凌晨3点到凌晨4点,第二个时段是凌晨4点到凌晨5点,那么这些范围不会重叠。如果真的希望将此案例视为重叠,请更改<
子句中的<=
- 至 - >
和>=
- 至 - where
性能与行数的平方成正比。对于大型数据集,可以使用更快的解决方案,但比这个更复杂。
答案 2 :(得分:0)
Select T1.*,T2.*
From Table1 T1
Inner Join Table1 T2 ON ((T1.InTime >= T2.InTime AND T1.OutTime > T2.InTime)
OR (T2.InTime >= T1.InTime AND T2.OutTime > T1.InTime))
AND (T1.ColumnId != T2.ColumnId)
答案 3 :(得分:0)
SELECT T1. ColumnId, T1.Intime, T1.OutTime
FROM T1, T2
WHERE 1 =1
AND ( T2.Intime BETWEEN T1.Intime AND T1.OutTime
OR T2.OutTime BETWEEN T1.Intime AND T1.OutTime )
AND T1.ColumnId <> T2.ColumnId
答案 4 :(得分:0)
如果SQL符合ansi 2003标准,则可以使用OVERLAPS功能。 注意 t1.c1&lt; t2.c1以避免重复。
SEL *
FROM TimeTable AS t1,TimeTable AS t2
WHERE (t1.Intime,t1.Outtime) OVERLAPS (t2.Intime,t2.Outtime)
AND t1.ColumnId < t2.ColumnId
ORDER BY 1;