我想在数据库中检查一些重复内容,所以我做了什么来查看哪些是重复的,我这样做了:
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
这样,我将获得所有具有related_field的行不止一次。此查询需要几毫秒才能执行。
现在,我想检查每个重复项,所以我想我可以使用上面的查询中的related_field选择some_table中的每一行,所以我确实喜欢这样:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)
由于某种原因(这需要几分钟),结果显然是极慢的。究竟是什么让它变慢? related_field已编入索引。
最后,我尝试从第一个查询(SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1)
创建一个视图“temp_view”,然后再进行第二次查询:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM temp_view
)
这很好用。 MySQL在几毫秒内完成此任务。
这里有哪位SQL专家可以解释发生了什么?
答案 0 :(得分:109)
将查询重写为此
SELECT st1.*, st2.relevant_field FROM sometable st1
INNER JOIN sometable st2 ON (st1.relevant_field = st2.relevant_field)
GROUP BY st1.id /* list a unique sometable field here*/
HAVING COUNT(*) > 1
我认为st2.relevant_field
必须在select中,否则having
子句会出错,但我不是100%肯定
切勿将IN
与子查询一起使用;这是出了名的慢
只能将IN
与固定的值列表一起使用。
更多提示
SELECT *
你真正需要的领域。relevant_field
上有一个索引,以加快equi-join。group by
。 90%IN (select
次查询的常规解决方案
使用此代码
SELECT * FROM sometable a WHERE EXISTS (
SELECT 1 FROM sometable b
WHERE a.relevant_field = b.relevant_field
GROUP BY b.relevant_field
HAVING count(*) > 1)
答案 1 :(得分:95)
正在为每一行运行子查询,因为它是一个相关查询。通过从子查询中选择所有内容,可以将相关查询转换为非相关查询,如下所示:
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
最终查询如下所示:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
)
答案 2 :(得分:5)
答案 3 :(得分:4)
SELECT st1.*
FROM some_table st1
inner join
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;
我在我的某个数据库上尝试过您的查询,并尝试将其重写为子查询的连接。
这种方法运行得更快,试试吧!
答案 4 :(得分:3)
我已经使用www.prettysql.net
重新格式化了你的慢速sql查询SELECT *
FROM some_table
WHERE
relevant_field in
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT ( * ) > 1
);
在查询和子查询中使用表时,应始终对两者进行别名,如下所示:
SELECT *
FROM some_table as t1
WHERE
t1.relevant_field in
(
SELECT t2.relevant_field
FROM some_table as t2
GROUP BY t2.relevant_field
HAVING COUNT ( t2.relevant_field ) > 1
);
这有帮助吗?
答案 5 :(得分:3)
试试这个
SELECT t1.*
FROM
some_table t1,
(SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT (*) > 1) t2
WHERE
t1.relevant_field = t2.relevant_field;
答案 6 :(得分:1)
有时当数据变得越来越大时,由于查询优化,因此WHERE IN可能会非常慢。尝试使用STRAIGHT_JOIN告诉mysql按原样执行查询,例如
SELECT STRAIGHT_JOIN table.field FROM table WHERE table.id IN (...)
但要注意:在大多数情况下,mysql优化器工作得很好,所以我建议只在遇到这种问题时才使用它
答案 7 :(得分:1)
首先,您可以找到重复的行,并查找行数,使用了多少次,并按照这个数字排序;
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN @curCode THEN
@curRow := @curRow + 1
ELSE
@curRow := 1
AND @curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
@curRow := 1,
@curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)

之后创建一个表并将结果插入其中。
create table CopyTable
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN @curCode THEN
@curRow := @curRow + 1
ELSE
@curRow := 1
AND @curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
@curRow := 1,
@curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)

最后,删除dublicate rows.No从0开始。除了每个组的第一个数字删除所有的dublicate行。
delete from CopyTable where No!= 0;

答案 8 :(得分:0)
这与我的情况类似,我有一个名为tabel_buku_besar
的表。我需要的是
在[{1}}中查找account_code='101.100'
tabel_buku_besar
,其中companyarea='20000'
且IDR
为currency
我需要从tabel_buku_besar
获取所有记录,其记录与步骤1相同,但在步骤1结果中有transaction_number
在使用select ... from...where....transaction_number in (select transaction_number from ....)
时,我的查询运行速度极慢,有时会导致请求超时或使我的应用程序无响应...
我尝试这种组合,结果......还不错......
`select DATE_FORMAT(L.TANGGAL_INPUT,'%d-%m-%y') AS TANGGAL,
L.TRANSACTION_NUMBER AS VOUCHER,
L.ACCOUNT_CODE,
C.DESCRIPTION,
L.DEBET,
L.KREDIT
from (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE!='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) L
INNER JOIN (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) R ON R.TRANSACTION_NUMBER=L.TRANSACTION_NUMBER AND R.COMPANYAREA=L.COMPANYAREA
LEFT OUTER JOIN master_account C ON C.ACCOUNT_CODE=L.ACCOUNT_CODE AND C.COMPANYAREA=L.COMPANYAREA
ORDER BY L.TANGGAL_INPUT,L.TRANSACTION_NUMBER`
答案 9 :(得分:0)
我发现这是最有效的查找值是否存在,逻辑可以很容易地反转以查找值是否存在(即IS NULL);
SELECT * FROM primary_table st1
LEFT JOIN comparision_table st2 ON (st1.relevant_field = st2.relevant_field)
WHERE st2.primaryKey IS NOT NULL
*将related_field替换为您要检查的值的名称
*将primaryKey替换为比较表上主键列的名称。