我有一个表,现在有大约100万行。以下查询大约需要5秒钟才能完成。您建议如何优化查询速度?
# Thread_id: 14 Schema: defrop_defrop QC_hit: No
# Query_time: 5.573048 Lock_time: 0.591625 Rows_sent: 0 Rows_examined: 1006391
# Rows_affected: 1
UPDATE `backlinks` as a
INNER JOIN(SELECT b.`id` as bid
FROM `backlinks` b
WHERE b.`googlebot_id` IS NULL AND b.`used_time` IS NULL AND
b.`campaign_id` IN (SELECT `id` FROM `campaigns` WHERE `status`=true) GROUP BY b.`campaign_id` ORDER BY RAND() limit 1
) as c
ON (a.id = c.bid)
SET a.`crawler_id` = '10.0.0.13', a.`used_time`=NOW();
campaign_id,googlebot_id是前导密钥,索引器。 used_time和crawler_id是索引器 表格phpmyadmin的屏幕截图
答案 0 :(得分:2)
这是查询的格式,因此我可以更好地阅读它:
UPDATE backlinks bl JOIN
(SELECT bl2.id as bid
FROM backlinks bl2
WHERE bl2.googlebot_id IS NULL AND
bl2.used_time IS NULL AND
bl2.campaign_id IN (SELECT c.id FROM campaigns c WHERE status = true)
GROUP BY b.campaign_id
ORDER BY RAND()
LIMIT 1
) bl2
ON bl.id = bl2.bid
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW();
首先,不需要子查询中的GROUP BY
。然后我将IN
替换为EXISTS
:
UPDATE backlinks bl JOIN
(SELECT bl2.id as bid
FROM backlinks bl2
WHERE bl2.googlebot_id IS NULL AND
bl2.used_time IS NULL AND
EXISTS (SELECT 1 FROM campaigns c WHERE bl2.campaign_id = c.id AND c.status = true)
ORDER BY RAND()
LIMIT 1
) bl2
ON bl.id = bl2.bid
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW();
这会有所帮助,但可能不会有太大帮助。我的猜测是,性能问题是外部排序的大小(或等效地,查询中GROUP BY
所需数据的大小)。
您还可以完全摆脱子查询:
UPDATE backlinks bl
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW()
WHERE bl.googlebot_id IS NULL AND
bl.used_time IS NULL AND
EXISTS (SELECT 1 FROM campaigns c WHERE bl.campaign_id = c.id AND c.status = true)
ORDER BY RAND()
LIMIT 1;
这具有最小的影响,但是会稍微清理一下逻辑。
我的猜测是WHERE
条件的选择性不是很高,因此优化它们不会有太大帮助。
目前,问题是ORDER BY RAND()
。如果您知道子查询将返回多少行,则可以使用RAND()
进行预过滤。例如,让我假设至少要返回1000行。然后:
UPDATE backlinks bl
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW()
WHERE bl.googlebot_id IS NULL AND
bl.used_time IS NULL AND
EXISTS (SELECT 1 FROM campaigns c WHERE bl.campaign_id = c.id AND c.status = true) AND
RAND() < 0.01 -- keep about 1/100
ORDER BY RAND()
LIMIT 1;
这大大加快了排序速度,因为它位于数据的第100位。但是,它可以过滤掉所有行-如果没有足够的行满足条件。