我正在开发一个PHP Web服务,它需要在一个包含2300万条记录的表上执行查询。我创建的查询似乎需要30多秒才能完成,而且我可以告诉它是导致问题的查询部分的顺序,因为没有它,查询响应很快。
这是查询;
SELECT artist_feeds.*, artists.name, artists.picture AS profile_picture
FROM artist_feeds
INNER JOIN user_artists ON user_artists.artist_id = artist_feeds.artist_id
INNER JOIN artists ON artists.id = artist_feeds.artist_id
WHERE artist_feeds.feed_date >= '2015-10-01'
AND user_artists.user_id = 486
AND NOT EXISTS (
SELECT id FROM user_artist_disabled_networks AS uadn
WHERE uadn.user_id = 486
AND uadn.artist_id = artist_feeds.artist_id
AND uadn.socialnetwork_id = artist_feeds.socialnetwork_id
LIMIT 1
)
ORDER BY artist_feeds.feed_date DESC
LIMIT 0, 20
查询的解释如下所示;
任何人都可以提供任何指示吗?
根据要求,SHOW CREATE TABLE输出;
CREATE TABLE `artist_feeds` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`feed_id` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`feed_date` datetime DEFAULT NULL,
`message` text COLLATE utf8mb4_unicode_ci,
`hash` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`type` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`source` mediumtext COLLATE utf8mb4_unicode_ci,
`picture` mediumtext COLLATE utf8mb4_unicode_ci,
`link` mediumtext COLLATE utf8mb4_unicode_ci,
`artist_id` int(11) DEFAULT '0',
`socialnetwork_id` int(11) DEFAULT '0',
`direct_link` mediumtext COLLATE utf8mb4_unicode_ci,
`is_master_feed` tinyint(4) DEFAULT '0',
`active` tinyint(4) DEFAULT '0',
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`rss_feed_id` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `artist_id` (`artist_id`),
KEY `socialnetwork_id` (`socialnetwork_id`),
KEY `feedidnetwork` (`feed_id`(191),`socialnetwork_id`),
KEY `feeddatenetworkid` (`feed_date`,`socialnetwork_id`),
KEY `feeddatenetworkidartistid` (`artist_id`,`socialnetwork_id`,`feed_date`),
KEY `type` (`type`),
KEY `feed_date` (`feed_date`)
) ENGINE=InnoDB AUTO_INCREMENT=26991713 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
已解决:感谢来自Bill的指针,我研究了能够更改表中表访问的顺序,以便artist_feed表是第一个访问的表,这反过来将消除对数据的文件输入的需要,这会导致速度提升。
我最终使用STRAIGHT_JOIN而不是INNER JOIN。我的工作查询是;
SELECT af.*, a.name, a.picture AS profile_picture
FROM artist_feeds AS af
STRAIGHT_JOIN user_artists AS ua ON ua.artist_id = af.artist_id
STRAIGHT_JOIN artists AS a ON a.id = af.artist_id
LEFT OUTER JOIN user_artist_disabled_networks AS uadn
ON uadn.user_id = ua.user_id AND uadn.socialnetwork_id = af.socialnetwork_id
WHERE af.feed_date >= '2015-10-01'
AND uadn.user_id IS NULL
AND ua.user_id = 498
ORDER BY af.feed_date DESC
LIMIT 0, 20
EXPLAIN现在看起来像这样;
答案 0 :(得分:2)
我会使用排除连接而不是NOT EXISTS子查询来编写查询:
SELECT af.*, a.name, a.picture AS profile_picture
FROM artist_feeds AS af
INNER JOIN user_artists AS ua ON ua.artist_id = af.artist_id
INNER JOIN artists AS a ON a.id = af.artist_id
LEFT OUTER JOIN user_artist_disabled_networks AS uadn
ON uadn.user_id = ua.user_id AND uadn.socialnetwork_id = af.socialnetwork_id
WHERE af.feed_date >= '2015-10-01'
AND ua.user_id = 486
AND uadn.user_id IS NULL
ORDER BY af.feed_date DESC
LIMIT 0, 20
根据EXPLAIN,表访问顺序为:
ua
按user_id查找a
通过PRIMARY KEY查找af
按artist_id查找,范围条件按feed_date uadn
通过user_id和socialnetwork_id查找所以你应该有索引:
您的查询性能问题的很大一部分无疑是 Temp表,filesort 。这是不可避免的,因为您的查询不会首先访问artist_feeds表。
在您的问题中重新更新:
覆盖优化程序的表访问顺序并不是一个好主意。您可以看到它首先强制它读取af
表,现在它必须检查该表中的1119万个条目。至少它能够避免手动排序结果 - 它可以依赖于af
表的自然顺序。但在这种情况下,我不确定这是一个很好的权衡。