在大表上使用LEFT JOIN查询真的很慢

时间:2016-03-20 13:48:34

标签: mysql performance left-join large-data

以下查询大约需要12秒才能执行。我尝试过优化,但无法做到。要加入的表非常大(> 8.000.000条记录)。

SELECT 
    p0_.id AS id_0, 
    p0_.ean AS ean_1, 
    p0_.brand AS brand_2, 
    p0_.type AS type_3, 
    p0_.retail_price AS retail_price_4, 
    p0_.target_price AS target_price_5, 
    min(NULLIF(c1_.delivery_price, 0)) AS sclr_6, 
    COALESCE(((p0_.target_price - min(NULLIF(c1_.delivery_price, 0))) / p0_.target_price * -100), 0) AS sclr_7 
FROM product p0_ 
LEFT JOIN crawl c1_ ON (
    c1_.product_ean = p0_.ean AND (
        c1_.crawl_date = p0_.last_crawl_date OR 
        p0_.last_crawl_date IS NULL
    ) 
    AND c1_.source_id IN (
        SELECT o2_.source_id AS sclr_8 
        FROM organisation_source o2_ 
        WHERE o2_.organisation_id = 5
    )
) 
WHERE p0_.organisation_id = 5 GROUP BY p0_.ean

我已经尝试过很多不同的方式编写查询,但遗憾的是没有给我任何表现获胜。如果我删除了最后一个中的子查询,它也无济于事。

见下面EXPLAIN语句的输出:

+------+--------------+-------+------+---------------------------------------------------+------------------+---------+------------------------+--------+-------------+
| id   | select_type  | table | type | possible_keys                                     | key              | key_len | ref                    | rows   | Extra       |
+------+--------------+-------+------+---------------------------------------------------+------------------+---------+------------------------+--------+-------------+
|    1 | PRIMARY      | p0_   | ref  | uniqueConstraint,IDX_D34A04AD9E6B1585             | uniqueConstraint | 5       | const                  |     69 | Using where |
|    1 | PRIMARY      | c1_   | ref  | IDX_product_ean,IDX_crawl_date                    | IDX_product_ean  | 62      | admin_pricev-p.p0_.ean | 468459 | Using where |
|    2 | MATERIALIZED | o2_   | ref  | PRIMARY,IDX_DD91A56E9E6B1585,IDX_DD91A56E953C1C61 | PRIMARY          | 4       | const                  |      1 | Using index |
+------+--------------+-------+------+---------------------------------------------------+------------------+---------+------------------------+--------+-------------+

请参阅下面的产品和爬网表的CREATE TABLE语句:

CREATE TABLE `product` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `organisation_id` int(11) DEFAULT NULL,
  `ean` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `brand` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `type` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `retail_price` decimal(10,2) NOT NULL,
  `target_price` decimal(10,2) NOT NULL,
  `last_crawl_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniqueConstraint` (`organisation_id`,`ean`),
  KEY `IDX_D34A04AD9E6B1585` (`organisation_id`),
  KEY `IDX_target_price` (`target_price`),
  KEY `IDX_ean` (`ean`),
  KEY `IDX_type` (`type`),
  KEY `IDX_last_crawl_date` (`last_crawl_date`),
  CONSTRAINT `FK_D34A04AD9E6B1585` FOREIGN KEY (`organisation_id`) REFERENCES `organisation` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=927 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `crawl` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `source_id` int(11) DEFAULT NULL,
  `store_id` int(11) DEFAULT NULL,
  `product_ean` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `crawl_date` datetime NOT NULL,
  `takeaway_price` decimal(10,2) DEFAULT NULL,
  `delivery_price` decimal(10,2) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_B4E9F1C2953C1C61` (`source_id`),
  KEY `IDX_B4E9F1C2B092A811` (`store_id`),
  KEY `IDX_product_ean` (`product_ean`),
  KEY `IDX_takeaway_price` (`takeaway_price`),
  KEY `IDX_crawl_date` (`crawl_date`),
  CONSTRAINT `FK_B4E9F1C2953C1C61` FOREIGN KEY (`source_id`) REFERENCES `source` (`id`),
  CONSTRAINT `FK_B4E9F1C2B092A811` FOREIGN KEY (`store_id`) REFERENCES `store` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=8606874 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

任何人都知道如何提高此查询的性能?非常感谢!如果需要更多信息,请告诉我们!

2 个答案:

答案 0 :(得分:2)

您可以将查询简化为:

SELECT . . .
FROM product p0_  LEFT JOIN
     crawl c1_
     ON c1_.product_ean = p0_.ean AND 
        c1_.crawl_date = p0_.last_crawl_date AND
        EXISTS (SELECT 1
                FROM organisation_source o2_ 
                WHERE o2_.organisation_id = 5 AND c1_.source_id = o2_.source_id 
               )
WHERE p0_.organisation_id = 5
GROUP BY p0_.ean;

p0_.last_crawl_date IS NULL可能是不必要的。即使比较中存在LEFT JOINNULL也会将所有行保留在第一个表中。您的逻辑匹配第二个表中的所有行(满足其他条件)。这可能是你想要的,但我猜不是。

在MySQL中,exists有时比in更快,这就是我重写该部分的原因。

对于此查询,您可以使用索引加快速度:product(organisation_id, ean, last_crawl_date)crawl(product_ean, crawl_date, source_id)organisation_source(source_id, organisation_id)

答案 1 :(得分:1)

尝试使用LEFT JOIN

上的复合索引
CREATE TABLE `product` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `organisation_id` int(11) DEFAULT NULL,
  `ean` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `brand` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `type` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `retail_price` decimal(10,2) NOT NULL,
  `target_price` decimal(10,2) NOT NULL,
  `last_crawl_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniqueConstraint` (`organisation_id`,`ean`),
  KEY `IDX_D34A04AD9E6B1585` (`organisation_id`),
  KEY `IDX_target_price` (`target_price`),
  KEY `IDX_ean` (`ean`),
  KEY `IDX_type` (`type`),
  KEY `IDX_last_crawl_date` (`last_crawl_date`),
  INDEX  `IDX_testing1` (`ean`,`last_crawl_date`),
  CONSTRAINT `FK_D34A04AD9E6B1585` FOREIGN KEY (`organisation_id`) REFERENCES `organisation` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=927 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `crawl` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `source_id` int(11) DEFAULT NULL,
  `store_id` int(11) DEFAULT NULL,
  `product_ean` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `crawl_date` datetime NOT NULL,
  `takeaway_price` decimal(10,2) DEFAULT NULL,
  `delivery_price` decimal(10,2) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_B4E9F1C2953C1C61` (`source_id`),
  KEY `IDX_B4E9F1C2B092A811` (`store_id`),
  KEY `IDX_product_ean` (`product_ean`),
  KEY `IDX_takeaway_price` (`takeaway_price`),
  KEY `IDX_crawl_date` (`crawl_date`),
  INDEX  `IDX_testing2` ( `source_id`,`product_ean`,`crawl_date`),
  CONSTRAINT `FK_B4E9F1C2953C1C61` FOREIGN KEY (`source_id`) REFERENCES `source` (`id`),
  CONSTRAINT `FK_B4E9F1C2B092A811` FOREIGN KEY (`store_id`) REFERENCES `store` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=8606874 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci