如何优化此内部联接查询以减少查询时间

时间:2019-04-13 13:00:55

标签: mysql sql mariadb

我有一个表,现在有大约100万行。以下查询大约需要5秒钟才能完成。您建议如何优化查询速度?

# Thread_id: 14  Schema: defrop_defrop  QC_hit: No
# Query_time: 5.573048  Lock_time: 0.591625  Rows_sent: 0  Rows_examined: 1006391
# Rows_affected: 1
UPDATE `backlinks` as a
INNER JOIN(SELECT b.`id` as bid
           FROM `backlinks` b
           WHERE b.`googlebot_id` IS NULL AND b.`used_time` IS NULL AND 
b.`campaign_id` IN  (SELECT `id` FROM `campaigns` WHERE `status`=true) GROUP BY b.`campaign_id` ORDER BY RAND() limit 1
           ) as c
 ON (a.id = c.bid)
SET a.`crawler_id` = '10.0.0.13', a.`used_time`=NOW();

campaign_id,googlebot_id是前导密钥,索引器。 used_time和crawler_id是索引器 表格phpmyadmin的屏幕截图 Phpmyadmin table backlinks

1 个答案:

答案 0 :(得分:2)

这是查询的格式,因此我可以更好地阅读它:

UPDATE backlinks bl JOIN
       (SELECT bl2.id as bid
        FROM backlinks bl2
        WHERE bl2.googlebot_id IS NULL AND
              bl2.used_time IS NULL AND 
              bl2.campaign_id IN (SELECT c.id FROM campaigns c WHERE status = true)
       GROUP BY b.campaign_id
       ORDER BY RAND() 
       LIMIT 1
     ) bl2
     ON bl.id = bl2.bid
    SET bl.crawler_id = '10.0.0.13',
        bl.used_time = NOW();

首先,不需要子查询中的GROUP BY。然后我将IN替换为EXISTS

UPDATE backlinks bl JOIN
       (SELECT bl2.id as bid
        FROM backlinks bl2
        WHERE bl2.googlebot_id IS NULL AND
              bl2.used_time IS NULL AND 
              EXISTS (SELECT 1 FROM campaigns c WHERE bl2.campaign_id = c.id AND c.status = true)
        ORDER BY RAND() 
        LIMIT 1
      ) bl2
      ON bl.id = bl2.bid
    SET bl.crawler_id = '10.0.0.13',
        bl.used_time = NOW();

这会有所帮助,但可能不会有太大帮助。我的猜测是,性能问题是外部排序的大小(或等效地,查询中GROUP BY所需数据的大小)。

您还可以完全摆脱子查询:

UPDATE backlinks bl
    SET bl.crawler_id = '10.0.0.13',
        bl.used_time = NOW()
WHERE bl.googlebot_id IS NULL AND
      bl.used_time IS NULL AND 
      EXISTS (SELECT 1 FROM campaigns c WHERE bl.campaign_id = c.id AND c.status = true)
ORDER BY RAND()
LIMIT 1;

这具有最小的影响,但是会稍微清理一下逻辑。

我的猜测是WHERE条件的选择性不是很高,因此优化它们不会有太大帮助。

目前,问题是ORDER BY RAND()。如果您知道子查询将返回多少行,则可以使用RAND()进行预过滤。例如,让我假设至少要返回1000行。然后:

UPDATE backlinks bl
    SET bl.crawler_id = '10.0.0.13',
        bl.used_time = NOW()
WHERE bl.googlebot_id IS NULL AND
      bl.used_time IS NULL AND 
      EXISTS (SELECT 1 FROM campaigns c WHERE bl.campaign_id = c.id AND c.status = true) AND
      RAND() < 0.01  -- keep about 1/100
ORDER BY RAND()
LIMIT 1;

这大大加快了排序速度,因为它位于数据的第100位。但是,它可以过滤掉所有行-如果没有足够的行满足条件。