我有以下查询在Locations表上进行自联接。当我在一百万条记录上运行此查询时,执行时间超过2小时。如果可以对此查询进行任何性能改进以便改进执行时间,那将非常感激。
SELECT
a.Id1, a.Id2, a.LocationStart, a.LocationEnd
FROM
Locations AS a
JOIN
Locations AS b
ON
a.Id1= b.Id1 AND a.Id2 = b.Id2
WHERE
a.DateTime = (
SELECT
MIN(DateTime)
FROM
Locations
WHERE
Id1 = a.Id1
AND Id2 = a.Id2)
答案 0 :(得分:1)
我会观察到你的查询真的没有意义。我认为它过于简单,所以我将包括两个表引用的列。
我首先要使用窗口函数:
SELECT l.Id1, l.Id2, l2.id1, l2.id2, l.LocationStart, l.LocationEnd
FROM (SELECT l.*,
ROW_NUMBER() OVER (PARTITION BY id1, id2 ORDER BY datetime ASC) as seqnum
FROM Locations l
) l JOIN
Locations l2
ON l.Id1 = l2.Id1 AND l.Id2 = l2.Id2 AND l.seqnum = 1;
这假设您正在从第一个表中查找唯一值(即没有重复日期时间)。
接下来,我会发现您只想要l1
字段的第一个值。你猜怎么着?您根本不需要join
。
select first_value(l.id1) over (partition by id1, id2 order by datetime),
first_value(l.id2) over (partition by id1, id2 order by datetime),
l.id1,
l.id2,
first_value(l.locationstart) over (partition by id1, id2 order by datetime),
first_value(l.locationend) over (partition by id1, id2 order by datetime)
from locations l;