MySQL - SELECT WHERE字段IN(子查询) - 为什么这么慢?

时间:2011-05-26 07:53:23

标签: mysql subquery where-in

我想在数据库中检查一些重复内容,所以我做了什么来查看哪些是重复的,我这样做了:

SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1

这样,我将获得所有具有related_field的行不止一次。此查询需要几毫秒才能执行。

现在,我想检查每个重复项,所以我想我可以使用上面的查询中的related_field选择some_table中的每一行,所以我确实喜欢这样:

SELECT *
FROM some_table 
WHERE relevant_field IN
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
)

由于某种原因(这需要几分钟),结果显然是极慢的。究竟是什么让它变慢? related_field已编入索引。

最后,我尝试从第一个查询(SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1)创建一个视图“temp_view”,然后再进行第二次查询:

SELECT *
FROM some_table
WHERE relevant_field IN
(
    SELECT relevant_field
    FROM temp_view
)

这很好用。 MySQL在几毫秒内完成此任务。

这里有哪位SQL专家可以解释发生了什么?

10 个答案:

答案 0 :(得分:109)

将查询重写为此

SELECT st1.*, st2.relevant_field FROM sometable st1
INNER JOIN sometable st2 ON (st1.relevant_field = st2.relevant_field)
GROUP BY st1.id  /* list a unique sometable field here*/
HAVING COUNT(*) > 1

我认为st2.relevant_field必须在select中,否则having子句会出错,但我不是100%肯定

切勿将IN与子查询一起使用;这是出了名的慢 只能将IN与固定的值列表一起使用。

更多提示

  1. 如果您想更快地进行查询, 不要只选择SELECT * 你真正需要的领域。
  2. 确保您在relevant_field上有一个索引,以加快equi-join。
  3. 确保主键上有group by
  4. 如果您使用的是InnoDB ,那么您只选择索引字段(并且事情并不复杂),而MySQL将仅使用索引来解析您的查询,从而加快速度起来。
  5. 90%IN (select次查询的常规解决方案

    使用此代码

    SELECT * FROM sometable a WHERE EXISTS (
      SELECT 1 FROM sometable b
      WHERE a.relevant_field = b.relevant_field
      GROUP BY b.relevant_field
      HAVING count(*) > 1) 
    

答案 1 :(得分:95)

正在为每一行运行子查询,因为它是一个相关查询。通过从子查询中选择所有内容,可以将相关查询转换为非相关查询,如下所示:

SELECT * FROM
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
) AS subquery

最终查询如下所示:

SELECT *
FROM some_table
WHERE relevant_field IN
(
    SELECT * FROM
    (
        SELECT relevant_field
        FROM some_table
        GROUP BY relevant_field
        HAVING COUNT(*) > 1
    ) AS subquery
)

答案 2 :(得分:5)

答案 3 :(得分:4)

SELECT st1.*
FROM some_table st1
inner join 
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;

我在我的某个数据库上尝试过您的查询,并尝试将其重写为子查询的连接。

这种方法运行得更快,试试吧!

答案 4 :(得分:3)

我已经使用www.prettysql.net

重新格式化了你的慢速sql查询
SELECT *
FROM some_table
WHERE
 relevant_field in
 (
  SELECT relevant_field
  FROM some_table
  GROUP BY relevant_field
  HAVING COUNT ( * ) > 1
 );

在查询和子查询中使用表时,应始终对两者进行别名,如下所示:

SELECT *
FROM some_table as t1
WHERE
 t1.relevant_field in
 (
  SELECT t2.relevant_field
  FROM some_table as t2
  GROUP BY t2.relevant_field
  HAVING COUNT ( t2.relevant_field ) > 1
 );

这有帮助吗?

答案 5 :(得分:3)

试试这个

SELECT t1.*
FROM 
 some_table t1,
  (SELECT relevant_field
  FROM some_table
  GROUP BY relevant_field
  HAVING COUNT (*) > 1) t2
WHERE
 t1.relevant_field = t2.relevant_field;

答案 6 :(得分:1)

有时当数据变得越来越大时,由于查询优化,因此WHERE IN可能会非常慢。尝试使用STRAIGHT_JOIN告诉mysql按原样执行查询,例如

SELECT STRAIGHT_JOIN table.field FROM table WHERE table.id IN (...)

但要注意:在大多数情况下,mysql优化器工作得很好,所以我建议只在遇到这种问题时才使用它

答案 7 :(得分:1)

首先,您可以找到重复的行,并查找行数,使用了多少次,并按照这个数字排序;



SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
		CASE q.NID
		WHEN @curCode THEN
			@curRow := @curRow + 1
		ELSE
			@curRow := 1
		AND @curCode := q.NID
		END
	) AS No
FROM UserInfo q,
(
		SELECT
			@curRow := 1,
			@curCode := ''
	) rt
WHERE q.NID IN
(
    SELECT NID
    FROM UserInfo
    GROUP BY NID
    HAVING COUNT(*) > 1
) 




之后创建一个表并将结果插入其中。



create table CopyTable 
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
		CASE q.NID
		WHEN @curCode THEN
			@curRow := @curRow + 1
		ELSE
			@curRow := 1
		AND @curCode := q.NID
		END
	) AS No
FROM UserInfo q,
(
		SELECT
			@curRow := 1,
			@curCode := ''
	) rt
WHERE q.NID IN
(
    SELECT NID
    FROM UserInfo
    GROUP BY NID
    HAVING COUNT(*) > 1
) 




最后,删除dublicate rows.No从0开始。除了每个组的第一个数字删除所有的dublicate行。



delete from  CopyTable where No!= 0;




答案 8 :(得分:0)

这与我的情况类似,我有一个名为tabel_buku_besar的表。我需要的是

  1. 在[{1}}中查找account_code='101.100' tabel_buku_besar,其中companyarea='20000'IDRcurrency

  2. 我需要从tabel_buku_besar获取所有记录,其记录与步骤1相同,但在步骤1结果中有transaction_number

  3. 在使用select ... from...where....transaction_number in (select transaction_number from ....)时,我的查询运行速度极慢,有时会导致请求超时或使我的应用程序无响应...

    我尝试这种组合,结果......还不错......

    `select DATE_FORMAT(L.TANGGAL_INPUT,'%d-%m-%y') AS TANGGAL,
          L.TRANSACTION_NUMBER AS VOUCHER,
          L.ACCOUNT_CODE,
          C.DESCRIPTION,
          L.DEBET,
          L.KREDIT 
     from (select * from tabel_buku_besar A
                    where A.COMPANYAREA='$COMPANYAREA'
                          AND A.CURRENCY='$Currency'
                          AND A.ACCOUNT_CODE!='$ACCOUNT'
                          AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) L 
    INNER JOIN (select * from tabel_buku_besar A
                         where A.COMPANYAREA='$COMPANYAREA'
                               AND A.CURRENCY='$Currency'
                               AND A.ACCOUNT_CODE='$ACCOUNT'
                               AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) R ON R.TRANSACTION_NUMBER=L.TRANSACTION_NUMBER AND R.COMPANYAREA=L.COMPANYAREA 
    LEFT OUTER JOIN master_account C ON C.ACCOUNT_CODE=L.ACCOUNT_CODE AND C.COMPANYAREA=L.COMPANYAREA 
    ORDER BY L.TANGGAL_INPUT,L.TRANSACTION_NUMBER`
    

答案 9 :(得分:0)

我发现这是最有效的查找值是否存在,逻辑可以很容易地反转以查找值是否存在(即IS NULL);

SELECT * FROM primary_table st1
LEFT JOIN comparision_table st2 ON (st1.relevant_field = st2.relevant_field)
WHERE st2.primaryKey IS NOT NULL

*将related_field替换为您要检查的值的名称

*将primaryKey替换为比较表上主键列的名称。