隐式JOIN在EXPLAIN中的行数较少,但运行速度比显式JOIN

时间:2017-04-04 06:24:08

标签: mysql sql database

我正在尝试优化下面的查询

SELECT `publisher_id`, `publisher_name`, SUM(`views`) AS `total_views`, SUM(`channels`) AS `total_channels`
FROM (
    SELECT DISTINCT `name` AS `publisher_name`, `id` AS `publisher_id` 
    FROM `publishers` 
    WHERE TRIM(`name`) <> ''
     ) AS `publisher_names`
    INNER JOIN
    (
    SELECT `twitch_name`, `publishers` 
    FROM `game_profiles` 
    WHERE `twitch_name` IS NOT NULL 
        AND `publishers` IS NOT NULL 
        AND TRIM(`publishers`) <> ''
    ) AS `game_list`
    ON `game_list`.`publishers` LIKE CONCAT('%', `publisher_names`.`publisher_name`, '%')
    INNER JOIN
    (
    SELECT `games`.`id` AS `id`, `games`.`name`, `games`.`simple_name`, `games`.`box`, SUM(`channels`) AS `channels`, SUM(`viewers`) AS `views`
    FROM `games`
    WHERE `log_date` >= SUBDATE(NOW(), INTERVAL 1 WEEK) 
        AND `log_date` <= SUBDATE(NOW(), INTERVAL 0 WEEK)
    GROUP BY `games`.`id`
    ) AS `view_list`
    ON `game_list`.`twitch_name` = `view_list`.`name`
GROUP BY `publisher_id` ORDER BY `total_views` DESC LIMIT 10;

使用 EXPLAIN 命令检查查询的效果,我得到以下结果。

EXPLAIN result for explicit JOIN

基本上游戏表包含每小时的观看次数和频道数, game_profiles 表将游戏映射到其发布者,以及< strong> publishers 表包含每个现有发布者的更详细的行。我想要实现的目标是根据过去一周游戏的总观看次数显示排名前十的发布商。

用尽了想法,我尝试使用隐式JOIN。查询写在下面

SELECT `publishers`.`id` AS `publisher_id`, `publishers`.`name` AS `publisher_name`, 
SUM(`games`.`viewers`) AS `total_views`, SUM(`games`.`channels`) AS `total_channels` 
FROM `game_profiles`, `publishers`, `games`
WHERE `game_profiles`.`twitch_name` IS NOT NULL 
    AND `game_profiles`.`publishers` IS NOT NULL AND TRIM(`game_profiles`.`publishers`) <> ''
    AND `game_profiles`.`publishers` LIKE CONCAT('%', `publishers`.`name`, '%') 
    AND `game_profiles`.`twitch_name` = `games`.`name`
    AND `games`.`log_date` >= SUBDATE(NOW(), INTERVAL 1 WEEK) 
    AND `games`.`log_date` <= SUBDATE(NOW(), INTERVAL 0 WEEK)
GROUP BY `publisher_id` ORDER BY `total_views` DESC LIMIT 10;

这为 EXPLAIN 命令提供了以下结果。

EXPLAIN result for implicit JOIN

根据我的理解,这应该返回相同的结果,但是MySQL工作台中的查询运行缓慢,我无法等待其结果,所以我无法验证它实际上是否返回相同的结果行。然而,仅从EXPLAIN结果判断,我认为后一个查询应该运行得更快。我不知道为什么会出现这种情况?非常感谢你。

P.S。我的数据库设计并不是最佳选择。这更像是一个原型数据库。这样做时没有进行标准化。我只是想更好地了解我的查询中发生了什么。谢谢。

1 个答案:

答案 0 :(得分:2)

在第二个查询中,您正在执行隐式CROSS JOIN,这是不可取的,并导致您的查询永远运行。这意味着您首先从所有表中选择所有行,然后在该操作之后过滤结果集。

至于第一个查询。

您的数据库设计不是很好。

条款game_list.publishers LIKE CONCAT('%', publisher_names.publisher_name, '%'远非最佳。应该有一个链接表。

所以索引也很差,检查缺少的索引,特别是在games表,列log_date上。

WHERE log_date >= SUBDATE(NOW(), INTERVAL 1 WEEK) 
  AND log_date <= SUBDATE(NOW(), INTERVAL 0 WEEK)

使用BETWEEN可以重新启动顺便提升以获得更好的可读性:

WHERE log_date BETWEEN SUBDATE(NOW(), INTERVAL 1 WEEK) 
                   AND SUBDATE(NOW(), INTERVAL 0 WEEK)

LTRIM(publishers) <> ''不是sargable,请尽量避免这种情况。 publishers <> ''就足够了。

最后一次INNER JOIN中的表games的分组可能也不是最佳的。对于这样的问题,最好是为SQL Fiddle提供样本数据。

但是你在所有子查询中总是犯一个错误。您使用INNER JOIN (SELECT x WHERE y) as Z ON z.something = a.something。这会导致索引性能下降。

优化的查询看起来像那样(未经验证):

SELECT 
    publisher_names.id AS publisher_id
    ,publisher_names.name AS publisher_name
    ,SUM(view_list.views) AS total_views
    ,SUM(view_list.channels) AS total_channels
FROM publishers AS publisher_names
INNER JOIN game_profiles AS game_list ON 
    twitch_name IS NOT NULL
    AND publishers IS NOT NULL
    AND publishers <> ''
    AND publishers LIKE CONCAT('%', publisher_names.publisher_name, '%')
INNER JOIN  games AS view_list 
         ON log_date BETWEEN SUBDATE(NOW(), INTERVAL 1 WEEK) 
                     AND SUBDATE(NOW(), INTERVAL 0 WEEK)        
            AND game_list.twitch_name = view_list.name
WHERE publisher_names.name <> ''
GROUP BY publisher_id
ORDER BY total_views DESC