所以我有2个看起来像这样的表
___A___ _____B____
id | a id | s | e
1 | 5 1 | 4 | 6
2 | 4 2 | 2 | 7
3 | 3 3 | 3 | 4
4 | 1 | 5
表A并且分别具有大约1,500,000和200,000行。我希望以A.a所在的最小间隔加入表格。
这是我的查询,但速度很慢
select A.a,
B.s,
B.e
from A
join B
on A.a > B.s
and A.a < B.e
and (B.e - B.s) = (
select min(B.e - B.s)
from B
where A.a > B.s
and A.a < B.e
)
子查询用于确保我们使用最小的间隔。有没有办法让这个跑得更快?
由于
答案 0 :(得分:0)
我不是postgresql专家,但您可以尝试使用CTE:
WITH A AS (
SELECT MIN(B.e - B.s) AS MinInterval
FROM #A AS A
INNER JOIN #B AS B ON A.a > B.s AND A.a < B.e) , B AS
(SELECT A.a
, B.s
, B.e
FROM #A AS A
JOIN #B AS B ON A.a > B.s AND A.a < B.e
AND (B.e - B.s) = (SELECT MinInterval FROM A))
SELECT * FROM B;
结果:
答案 1 :(得分:0)
NOT EXISTS()
版本有时可以避免聚合子查询:
SELECT a.a,
b.s,
b.e
FROM AAAA a
JOIN BBBB b
ON a.a > b.s
AND a.a < b.e
AND NOT EXISTS ( SELECT *
FROM BBBB nx
WHERE a.a > nx.s
AND a.a < nx.e
AND (nx.e - nx.s) < (b.e - b.s)
);
答案 2 :(得分:0)
使用RANK() window function使这一点变得相对简单:
SELECT ranked.id, ranked.val, ranked.start, ranked.end
FROM
(
SELECT
a.id,
a.val,
b.start,
b.end,
RANK() OVER (PARTITION BY a.id ORDER BY (b.end - b.start) ASC, b.id ASC) AS match_rank
FROM a
JOIN b
ON a.val BETWEEN b.start AND b.end
) ranked
WHERE ranked.match_rank = 1
您找到所有匹配项,然后针对每个匹配项,根据b
范围的小范围为其分配排名值。范围越小越好(使用b.id
作为决胜局来防止重复)。然后,我们只保留每个a.id
的最佳匹配。
答案 3 :(得分:0)
按版本试用该小组:
select A.a
, B.s
, B.e
from A
join B on A.a > B.s and A.a < B.e
group by A.a
, B.s
, B.e
, B.e - B.s
having (B.e - B.s) = min(B.e - B.s)