Question

查询1：

select distinct email from mybigtable where account_id=345

需要0.1秒

查询2：

Select count(*) as total from mybigtable where account_id=123 and email IN (<include all from above result>)

需要0.2秒

查询3：

Select count(*) as total from mybigtable where account_id=123 and email IN (select distinct email from mybigtable where account_id=345)

需要22分钟，90％处于“准备”状态。为什么这需要这么多时间。

表是在MySQL 5.0上有3.2mil行的innodb

Answer 1

还有更多内容：

我怀疑您的查询缓存已针对查询1和2进行了加热，从而产生了错误的结果。在FLUSH QUERY CACHE;
我怀疑查询3将通过临时表，最常见的是磁盘，而查询2保证从RAM运行。 my.cnf中临时表的默认设置非常保守。
试试这个以确保你没有受到MySQL中旧的去优化错误的影响

SELECT count(DISTINCT b.primary_key_column) AS total
FROM mybigtable a
INNER JOIN mybigtable b
ON a.email=b.email
WHERE a.account_id=345
AND b.account_id=123

Answer 2

MySQL在IN子句中的子查询中非常糟糕。我会把它重写为：

SELECT COUNT(*) as total
FROM mybigtable t
INNER JOIN (
    SELECT DISTINCT email
    FROM mybigtable
    WHERE account_id = 345
) x
ON t.email = x.email
WHERE t.account_id=123

Answer 3

经过深思熟虑并在dba.se的帮助下，这是最终的查询工作majic。

select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    LEFT JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
    WHERE tbl345.email IS NULL
) A;

级联查询大幅减慢，但它们可以独立工作

3 个答案: