从许多可能的行

时间:2015-08-12 16:09:57

标签: sql-server tsql sql-server-2012

我确实需要查询帮助。

详细说明: 我有500万个ID和EventID事件以及一个[datetime] starttime ID-EventID不是唯一的密钥,可以一天发生几次 对于每一行,我可能在接下来的5天内没有匹配,一个匹配甚至10.000匹配[datetime]结束时间。

我需要的只是最接近开始时间的一个终结时间。

查询本身相当简单,但由于事实,我有几百万个事件,每个事件只需要一次点击即可达到10.000次点击。它已经达到数十亿,并且不再有效。

我写了一组样本数据,包括样本结果。 (见下文)

我需要的是一个只包含一个匹配行的查询,其余部分单独留下。

CREATE TABLE #starts(
    [id] [smallint] NULL,
    [event_id] [nvarchar](50) NULL,
    [dt_start] [datetime] NULL
)

INSERT #starts ([id], [event_id], [dt_start]) VALUES (2, N'alpha', CAST(N'2015-05-01 23:06:22.000' AS DateTime))
INSERT #starts ([id], [event_id], [dt_start]) VALUES (2, N'alpha', CAST(N'2015-05-10 23:42:01.000' AS DateTime))
INSERT #starts ([id], [event_id], [dt_start]) VALUES (2, N'alpha', CAST(N'2015-05-28 02:36:44.000' AS DateTime))
INSERT #starts ([id], [event_id], [dt_start]) VALUES (2, N'alpha', CAST(N'2015-05-29 08:56:17.000' AS DateTime))

CREATE TABLE #ends(
    [id] [smallint] NULL,
    [event_id] [nvarchar](50) NULL,
    [dt_end] [datetime] NULL
)


INSERT #ends ([id], [event_id], [dt_end]) VALUES (2, N'alpha', CAST(N'2015-05-01 23:09:32.000' AS DateTime))
INSERT #ends ([id], [event_id], [dt_end]) VALUES (2, N'alpha', CAST(N'2015-05-28 02:40:14.000' AS DateTime))
INSERT #ends ([id], [event_id], [dt_end]) VALUES (2, N'alpha', CAST(N'2015-05-28 08:57:39.000' AS DateTime))
INSERT #ends ([id], [event_id], [dt_end]) VALUES (2, N'alpha', CAST(N'2015-05-28 14:09:39.000' AS DateTime))
INSERT #ends ([id], [event_id], [dt_end]) VALUES (2, N'alpha', CAST(N'2015-06-01 10:18:18.000' AS DateTime))
INSERT #ends ([id], [event_id], [dt_end]) VALUES (2, N'alpha', CAST(N'2015-06-01 14:42:04.000' AS DateTime))
GO

-- one extra step to clarify
select a.id, a.event_id,dt_start, dt_end 
,row_number() over (partition by a.id, a.event_id,dt_start order by dt_end) as rn
from #starts as a
left join #ends as b
on a.id=b.id
and a.event_id=b.event_id
AND a.dt_start<b.dt_end
and datediff(day,dt_start,dt_end) <=5

-- the result
select * from (
select a.id, a.event_id,dt_start, dt_end 
,row_number() over (partition by a.id, a.event_id,dt_start order by dt_end) as rn
from #starts as a
left join #ends as b
on a.id=b.id
and a.event_id=b.event_id
AND a.dt_start<b.dt_end
and datediff(day,dt_start,dt_end) <=5
) as dummy
where rn=1

Thanx可能有任何帮助。

1 个答案:

答案 0 :(得分:0)

T&&