我正在运行两个查询。
第一个获取唯一ID。这在~350ms内执行。
select parent_id
from duns_match_sealed_air_072815
group by duns_number
然后我将这些ID粘贴到第二个查询中。粘贴> 10k ID后,它也会在大约350毫秒内执行。
select term, count(*) as count
from companies, business_types, business_types_to_companies
where
business_types.id = business_types_to_companies.term_id
and companies.id = business_types_to_companies.company_id
and raw_score > 25
and diversity = 1
and company_id in (paste,ten,thousand,ids,here)
group by term
order by count desc;
当我将这些查询合并为一个时,执行需要很长时间。我不知道有多久,因为我在几分钟后停止了它。
select term, count(*) as count
from companies, business_types, business_types_to_companies
where
business_types.id = business_types_to_companies.term_id
and companies.id = business_types_to_companies.company_id
and raw_score > 25
and diversity = 1
and company_id in (
select parent_id
from duns_match_sealed_air_072815
group by duns_number
)
group by term
order by count desc;
发生了什么事?
答案 0 :(得分:1)
这取决于它处理查询的方式 - 我相信它必须为每一行运行一次嵌入式查询,而使用两个查询可以存储结果。
希望这有帮助!
答案 1 :(得分:1)
该查询已使用JOIN
重新编写,但我特别使用EXISTS
代替IN
。这是黑暗中的短暂。可能在子查询中生成了许多值,导致外部查询在匹配从子查询返回的每个项目时遇到困难。
select term, count(*) as count
from companies c
inner join business_types_to_companies bc on bc.company_id = c.id
inner join business_types b on b.id = bc.term_id
where
raw_score > 25
and diversity = 1
and exists (
select 1
from duns_match_sealed_air_072815
where parent_id = c.id
)
group by term
order by count desc;
答案 2 :(得分:1)
首先,尊重您的子查询不会以合理的方式使用GROUP BY
。
select parent_id /* wrong GROUP BY */
from duns_match_sealed_air_072815
group by duns_number
事实上,它误用了GROUP BY
的有害MySQL扩展。读这个。 http://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html。我无法告诉您的应用程序逻辑在此查询中的意图,但我可以告诉您,它实际上返回了与每个不同parent_id
值关联的不可预测的选定duns_number
值。
你想要吗
select MIN(parent_id) parent_id
from duns_match_sealed_air_072815
group by duns_number
或类似的东西?那个选择与每个给定数字相关联的最低父ID。
有时MySQL很难优化WHERE .... IN ()
查询模式。请尝试加入。像这样:
select term, count(*) as count
from companies
join (
select MIN(parent_id) parent_id
from duns_match_sealed_air_072815
group by duns_number
) idlist ON companies.id = idlist.parent_id
join business_types_to_companies ON companies.id = business_types_to_companies.company_id
join business_types ON business_types.id = business_types_to_companies.term_id
where raw_score > 25
and diversity = 1
group by term
order by count desc
为了进一步优化这一点,我们需要查看表定义和EXPLAIN的输出。