我正在尝试通过大表(接近18 000行)中的多个列搜索重复的行。问题是查询需要花费很多时间,我试过这个:
SELECT * FROM table_name a, table_name b
WHERE a.col1 = b.col1
AND a.col2 = b.col2
AND a.col3 = b.col3
AND a.col4 = b.col4
AND a.id <> b.id
和此:
SELECT *
FROM table_name
WHERE col1 IN (
SELECT col1
FROM table_name
GROUP BY col1
HAVING count(col1) > 1
)
AND col2 IN (
SELECT col2
FROM table_name
GROUP BY col2
HAVING count(col2) > 1
)
AND col3 IN (
SELECT col3
FROM table_name
GROUP BY col3
HAVING count(col3) > 1
)
AND col4 IN (
SELECT col4
FROM table_name
GROUP BY col4
HAVING count(col4) > 1
)
他们都工作,但太慢了。有什么想法吗?
答案 0 :(得分:1)
您可以尝试使用一个联合GROUP BY语句,如:
SELECT * FROM table_name
GROUP BY col1, col2, col3, col4
HAVING count(*) > 1
至少,它看起来会更清洁。
修改强>
要将所有结果作为上一列的子集返回:
SELECT *
FROM table_name
WHERE col4 IN (
SELECT col4
FROM table_name
WHERE col3 IN (
SELECT col3
FROM table_name
WHERE col2 IN (
SELECT col2
FROM table_name
WHERE col1 IN (
SELECT col1
FROM table_name
GROUP BY col1
HAVING count(col1) > 1
)
)
)
从概念上讲,这应该能够以更快的执行时间为您提供所有结果。