Question

我正在开发一个PHP Web服务，它需要在一个包含2300万条记录的表上执行查询。我创建的查询似乎需要30多秒才能完成，而且我可以告诉它是导致问题的查询部分的顺序，因为没有它，查询响应很快。

这是查询;

SELECT artist_feeds.*, artists.name, artists.picture AS profile_picture
FROM artist_feeds
INNER JOIN user_artists ON user_artists.artist_id = artist_feeds.artist_id
INNER JOIN artists ON artists.id = artist_feeds.artist_id
WHERE artist_feeds.feed_date >= '2015-10-01'
    AND user_artists.user_id = 486
    AND NOT EXISTS (
        SELECT id FROM user_artist_disabled_networks AS uadn
        WHERE uadn.user_id = 486
            AND uadn.artist_id = artist_feeds.artist_id
            AND uadn.socialnetwork_id = artist_feeds.socialnetwork_id
        LIMIT 1
        )
ORDER BY artist_feeds.feed_date DESC
LIMIT 0, 20

查询的解释如下所示;

任何人都可以提供任何指示吗？

根据要求，SHOW CREATE TABLE输出;

CREATE TABLE `artist_feeds` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `feed_id` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `feed_date` datetime DEFAULT NULL,
  `message` text COLLATE utf8mb4_unicode_ci,
  `hash` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `type` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `source` mediumtext COLLATE utf8mb4_unicode_ci,
  `picture` mediumtext COLLATE utf8mb4_unicode_ci,
  `link` mediumtext COLLATE utf8mb4_unicode_ci,
  `artist_id` int(11) DEFAULT '0',
  `socialnetwork_id` int(11) DEFAULT '0',
  `direct_link` mediumtext COLLATE utf8mb4_unicode_ci,
  `is_master_feed` tinyint(4) DEFAULT '0',
  `active` tinyint(4) DEFAULT '0',
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `rss_feed_id` int(11) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `artist_id` (`artist_id`),
  KEY `socialnetwork_id` (`socialnetwork_id`),
  KEY `feedidnetwork` (`feed_id`(191),`socialnetwork_id`),
  KEY `feeddatenetworkid` (`feed_date`,`socialnetwork_id`),
  KEY `feeddatenetworkidartistid` (`artist_id`,`socialnetwork_id`,`feed_date`),
  KEY `type` (`type`),
  KEY `feed_date` (`feed_date`)
) ENGINE=InnoDB AUTO_INCREMENT=26991713 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

已解决：感谢来自Bill的指针，我研究了能够更改表中表访问的顺序，以便artist_feed表是第一个访问的表，这反过来将消除对数据的文件输入的需要，这会导致速度提升。

我最终使用STRAIGHT_JOIN而不是INNER JOIN。我的工作查询是;

SELECT af.*, a.name, a.picture AS profile_picture
FROM artist_feeds AS af
STRAIGHT_JOIN user_artists AS ua ON ua.artist_id = af.artist_id
STRAIGHT_JOIN artists AS a ON a.id = af.artist_id
LEFT OUTER JOIN user_artist_disabled_networks AS uadn
  ON uadn.user_id = ua.user_id AND uadn.socialnetwork_id = af.socialnetwork_id
WHERE af.feed_date >= '2015-10-01'
    AND uadn.user_id IS NULL
    AND ua.user_id = 498
ORDER BY af.feed_date DESC
LIMIT 0, 20

EXPLAIN现在看起来像这样;

Answer 1

我会使用排除连接而不是NOT EXISTS子查询来编写查询：

SELECT af.*, a.name, a.picture AS profile_picture
FROM artist_feeds AS af
INNER JOIN user_artists AS ua ON ua.artist_id = af.artist_id
INNER JOIN artists AS a ON a.id = af.artist_id
LEFT OUTER JOIN user_artist_disabled_networks AS uadn
  ON uadn.user_id = ua.user_id AND uadn.socialnetwork_id = af.socialnetwork_id
WHERE af.feed_date >= '2015-10-01'
  AND ua.user_id = 486
  AND uadn.user_id IS NULL
ORDER BY af.feed_date DESC
LIMIT 0, 20

根据EXPLAIN，表访问顺序为：

ua按user_id查找
a通过PRIMARY KEY查找
af按artist_id查找，范围条件按feed_date
uadn通过user_id和socialnetwork_id查找

所以你应该有索引：

user_artists（user_id，artist_id）
艺术家只需要PRIMARY KEY
artist_feeds（artist_id，feed_date）
user_artist_disabled_networks（user_id，socialnetwork_id）

您的查询性能问题的很大一部分无疑是 Temp表，filesort 。这是不可避免的，因为您的查询不会首先访问artist_feeds表。

在您的问题中重新更新：

覆盖优化程序的表访问顺序并不是一个好主意。您可以看到它首先强制它读取af表，现在它必须检查该表中的1119万个条目。至少它能够避免手动排序结果 - 它可以依赖于af表的自然顺序。但在这种情况下，我不确定这是一个很好的权衡。

桌面上的MySQL查询23m行非常慢

1 个答案: