我有两个SQL查询来计算不同id1值之间的id2值之间的共现。示例表看起来像
id1 | id2
101 | 1
101 | 2
101 | 3
102 | 2
102 | 3
102 | 4
103 | 15
103 | 3
103 | 4
,所需的输出是:
A B Count
1 2 1
1 3 2
2 3 4
1 4 2
2 4 3
3 4 4
1 15 1
2 15 2
3 15 2
4 15 1
下面粘贴了两种解决方案。
-- Solution 1
SELECT bar.id2 AS A, foo.id2 AS B, COUNT(*) AS Count
FROM
(SELECT * FROM TestTab) AS bar,
(SELECT * FROM TestTab) AS foo
WHERE bar.id1 <> foo.id1
AND bar.id2 < foo.id2
GROUP BY bar.id2, foo.id2
-- Solution 2
SELECT bar.id2 AS A, foo.id2 AS B, COUNT(*) AS Count
FROM TestTab AS bar
JOIN TestTab AS foo
ON bar.id1 <> foo.id1
WHERE bar.id2 < foo.id2
GROUP BY bar.id2, foo.id2
两个查询在小表(即100-1000行)上工作正常,但我需要查询更大的表(例如,100.000行)。我想知道如何加快查询速度并提高性能。提前感谢任何指示。
- Create table TestTab and insert dummy data
CREATE TABLE TestTab
INSERT INTO TestTab VALUES
(101,1),
(101,2),
(101,3),
(102,2),
(102,3),
(102,4),
(103,15),
(103,3),
(103,4)
答案 0 :(得分:3)
我建议在id2
上添加一个索引到TestTab(如果还没有),然后尝试运行以下代码:
select distinct id2 into #id2 from TestTab;
SELECT bar.id2 AS A, foo.id2 AS B, COUNT(*) AS Count
FROM #id2 AS bar
JOIN #id2 AS foo ON bar.id2 < foo.id2
JOIN TestTab AS buz ON bar.id2 = buz.id2
JOIN TestTab AS fuz ON foo.id2 = fuz.id2
WHERE buz.id1 <> fuz.id1
GROUP BY bar.id2, foo.id2;
(如果您已经有一个表中包含不同值id2的表,请跳过创建临时表并改为使用它。)
答案 1 :(得分:1)
两个查询都是连接和等效的。
第一个是带有附加子选择的隐式连接。如果SQL Server没有优化子选择,它可能会更慢。
正如其他人已经观察到的那样,如果你还没有将索引添加到连接条件列id1
和where子句列id2
中,那么这就是