为什么带有从属子查询的MySQL查询执行的时间比单独执行每个语句要长得多?

时间:2015-07-29 19:28:41

标签: mysql indexing

我正在运行两个查询。

第一个获取唯一ID。这在~350ms内执行。

select parent_id
from duns_match_sealed_air_072815
group by duns_number

然后我将这些ID粘贴到第二个查询中。粘贴> 10k ID后,它也会在大约350毫秒内执行。

select term, count(*) as count
from companies, business_types, business_types_to_companies
where
    business_types.id = business_types_to_companies.term_id
    and companies.id = business_types_to_companies.company_id
    and raw_score > 25
    and diversity = 1
    and company_id in (paste,ten,thousand,ids,here)
group by term
order by count desc;

当我将这些查询合并为一个时,执行需要很长时间。我不知道有多久,因为我在几分钟后停止了它。

select term, count(*) as count
from companies, business_types, business_types_to_companies
where
    business_types.id = business_types_to_companies.term_id
    and companies.id = business_types_to_companies.company_id
    and raw_score > 25
    and diversity = 1
    and company_id in (
        select parent_id
        from duns_match_sealed_air_072815
        group by duns_number
    )
group by term
order by count desc;

发生了什么事?

3 个答案:

答案 0 :(得分:1)

这取决于它处理查询的方式 - 我相信它必须为每一行运行一次嵌入式查询,而使用两个查询可以存储结果。

希望这有帮助!

答案 1 :(得分:1)

该查询已使用JOIN重新编写,但我特别使用EXISTS代替IN。这是黑暗中的短暂。可能在子查询中生成了许多值,导致外部查询在匹配从子查询返回的每个项目时遇到困难。

select term, count(*) as count
from companies c
inner join business_types_to_companies bc on bc.company_id = c.id
inner join business_types b on b.id = bc.term_id
where 
    raw_score > 25
    and diversity = 1
    and exists (
        select 1 
        from duns_match_sealed_air_072815
        where parent_id = c.id
    )
group by term
order by count desc;

答案 2 :(得分:1)

首先,尊重您的子查询不会以合理的方式使用GROUP BY

select parent_id         /* wrong GROUP BY */
  from duns_match_sealed_air_072815
 group by duns_number

事实上,它误用了GROUP BY的有害MySQL扩展。读这个。 http://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html。我无法告诉您的应用程序逻辑在此查询中的意图,但我可以告诉您,它实际上返回了与每个不同parent_id值关联的不可预测的选定duns_number值。

你想要吗

select MIN(parent_id) parent_id  
  from duns_match_sealed_air_072815
 group by duns_number

或类似的东西?那个选择与每个给定数字相关联的最低父ID。

有时MySQL很难优化WHERE .... IN ()查询模式。请尝试加入。像这样:

select term, count(*) as count
  from companies
  join (
          select MIN(parent_id) parent_id
            from duns_match_sealed_air_072815
           group by duns_number
       ) idlist ON companies.id = idlist.parent_id
  join business_types_to_companies ON companies.id = business_types_to_companies.company_id
  join business_types ON business_types.id = business_types_to_companies.term_id
 where raw_score > 25
   and diversity = 1
 group by term 
 order by count desc

为了进一步优化这一点,我们需要查看表定义和EXPLAIN的输出。