在处理300 000行时,有人猜测为什么这句话花了太长时间。基本上这个查询用于查找重复项。
SELECT DISTINCT
a.Id,
b.Id as sid
FROM
csv_temp a
INNER JOIN
csv_temp b ON a.firstname = b.firstname AND
a.lastname = b.lastname AND
((a.address = b.address) OR
(a.zip = b.zip) OR
(a.city = b.city AND a.state = b.state) )
WHERE
a.Id <> b.Id AND
a.status=2 AND
b.status=1 AND
a.flag !=1 AND
b.flag !=1
答案 0 :(得分:3)
OR经常表现不佳,而在JOIN条件下,我预计情况会更糟。尝试使用3个SELECT(每个ORed条件一个)和UNION结果。如果这样做,则怀疑不需要DISTINCTS: -
SELECT
a.Id,
b.Id as sid
FROM
csv_temp a
INNER JOIN
csv_temp b ON a.firstname = b.firstname AND
a.lastname = b.lastname AND
a.address = b.address
WHERE
a.Id <> b.Id AND
a.status=2 AND
b.status=1 AND
a.flag !=1 AND
b.flag !=1
UNION
SELECT
a.Id,
b.Id as sid
FROM
csv_temp a
INNER JOIN
csv_temp b ON a.firstname = b.firstname AND
a.lastname = b.lastname AND
a.zip = b.zip
WHERE
a.Id <> b.Id AND
a.status=2 AND
b.status=1 AND
a.flag !=1 AND
b.flag !=1
UNION
SELECT
a.Id,
b.Id as sid
FROM
csv_temp a
INNER JOIN
csv_temp b ON a.firstname = b.firstname AND
a.lastname = b.lastname AND
a.city = b.city AND a.state = b.state
WHERE
a.Id <> b.Id AND
a.status=2 AND
b.status=1 AND
a.flag !=1 AND
b.flag !=1
答案 1 :(得分:0)
现在,在比较
中使用的列上添加索引后,使用解释计划进行检查