Question

我有一个IMDB数据库，我正在尝试计算每年制作的平均演员数。问题是关于从子查询中选择的速度差异。

我的查询是：

SELECT AVG(sub.num) 
FROM 
   (SELECT 
        COUNT(production_cast.person_id) AS num, 
        production.production_year AS pyear 
    FROM production_cast 
    INNER JOIN production ON production.id = production_cast.production_id
    GROUP BY production.id) sub
GROUP BY(sub.pyear)

然而，为了简化，这些是关于以下问题的两个查询：

使用子查询

SELECT sub.num 
FROM 
    (SELECT 
         COUNT(production_cast.person_id) AS num, 
         production.production_year AS pyear  
     FROM production_cast 
     INNER JOIN production ON production.id = production_cast.production_id
     GROUP BY production.id) sub

没有子查询：

SELECT 
    COUNT(production_cast.person_id) AS num, 
    production.production_year AS pyear  
FROM production_cast 
INNER JOIN production ON production.id = production_cast.production_id
GROUP BY production.id

第二个持续时间不到一秒，第一个永远不会完成。 - 超过5分钟。

具有子查询

的EXPLAIN

+-------------+------------------+-------+-----------------------------------+-------------+
| select_type | table            | type  | key                               | Extra       |
+-------------+------------------+-------+-----------------------------------+-------------+
| PRIMARY     |  <derived2>      | ALL   | NULL                              | NULL        |
| DERIVED     |  production      | index | idx_Production_id_production_year | Using index |
| DERIVED     |  production_cast | ref   |  production_id                    | NULL        |
+-------------+------------------+-------+-----------------------------------+-------------+

没有子查询的EXPLAIN：

+-------------+-----------------+-----------------------------------+------------+
| select_type | table           | key                               | Extra      |
+-------------+-----------------+-----------------------------------+------------+
| SIMPLE      | production      | idx_Production_id_production_year | Usingindex |
| SIMPLE      | production_cast | production_id                     | NULL       |
+-------------+-----------------+-----------------------------------+------------+

这种性能差异背后的原因是什么？可以做些什么来阻止它？

Answer 1

派生：在临时表中跳转，没有索引的可能性

子查询：像瘟疫一样避免

来自https://www.percona.com/blog/2006/08/31/derived-tables-and-views-performance/

“要注意的是事实派生表甚至要实现EXPLAIN语句。所以如果你在select in from子句中做错了，即忘记了连接条件你可能会有EXPLAIN永远运行。”

在FROM子句中选择子查询的速度差异

1 个答案: