在MySQL中使用子查询缓慢连接内部

时间:2015-01-21 14:28:08

标签: mysql performance join

我遇到了一个MySQL查询速度慢的问题(MySQL 5+)。让我们想一下三个表:

customers:
- id_customer : int (PRIMARY)
- name        : varchar(255)

customers_addresses:
- id_customers_addresses : int (PRIMARY)
- id_customer : int (INDEX)
- street : varchar(255)
- zipcode : varchar(255)
- city : varchar(255)

customers_contacts:
- id_customers_contacts : int (PRIMARY)
- id_customer : int (INDEX)
- type : varchar(255)
- value : varchar(255)

现在,我的目标是在一个查询中收集所有地址和联系信息,并为每个客户收集一行。我的第一次尝试是使用LEFT JOIN s,因为有些客户没有任何地址和/或联系信息:

SELECT customers.id_customer,
       customers.name,
       X.contact AS contact,
       Y.street,
       Y.zipcode,
       Y.city
FROM customers
LEFT JOIN
(
  SELECT
    GROUP_CONCAT( CONCAT( type, ': ', value ) SEPARATOR ', ' ) AS contact,
    id_customer
  FROM customers_contacts
  GROUP BY id_customer
) AS X
ON X.id_customer = customers.id_customer

LEFT JOIN
(
  SELECT
    GROUP_CONCAT(street SEPARATOR '<br>' ) AS street,
    GROUP_CONCAT(zipcode SEPARATOR '<br>' ) AS zipcode,
    GROUP_CONCAT(city SEPARATOR '<br>' ) AS city,
    id_customer
  FROM customers_addresses
  GROUP BY id_customer
) AS Y
ON Y.id_customer = customers.id_customer
WHERE Y.street LIKE '%Avenue%'
ORDER BY customers.name DESC
LIMIT 0, 20

此查询需要130秒才能完成(每个表中约有7000个条目),这远远不够。

预先EXPLAIN EXTENDED给出:

id  select_type table               type    possible_keys   key            key_len     ref     rows    filtered    Extra
1   PRIMARY     customers           ref     name            name           3           const   4334    100.00      Using where; Using temporary; Using filesort
1   PRIMARY     <derived2>          ALL     NULL            NULL           NULL        NULL    7793    100.00
1   PRIMARY     <derived3>          ALL     NULL            NULL           NULL        NULL    8580    100.00      Using where
3   DERIVED     customers_addresses index   NULL            id_customer    5           NULL    8651    100.00
2   DERIVED     customers_contacts  index   NULL            id_customer    4           NULL    9314    100.00

我读了一些stackoverflow帖子和MySQL文档。两人都说INNER JOIN要快得多。我尝试使用LEFT JOIN复制INNER JOIN行为UNION ALL

SELECT customers.id_customer,
       customers.name,
       X.contact AS contact,
       Y.street,
       Y.zipcode,
       Y.city
FROM customers
INNER JOIN
(
  SELECT
    GROUP_CONCAT( CONCAT( type, ': ', value ) SEPARATOR ', ' ) AS contact,
    id_customer
  FROM customers_contacts
  GROUP BY id_customer
  UNION ALL
  SELECT
    '' AS contact,
    id_customer
  FROM customers
  WHERE id_customer NOT IN (SELECT DISTINCT id_customer FROM customers_contacts)
) AS X
ON X.id_customer = customers.id_customer

INNER JOIN
(
  SELECT
    GROUP_CONCAT(street SEPARATOR '<br>' ) AS street,
    GROUP_CONCAT(zipcode SEPARATOR '<br>' ) AS zipcode,
    GROUP_CONCAT(city SEPARATOR '<br>' ) AS city,
    id_customer
  FROM customers_addresses
  GROUP BY id_customer
  UNION ALL
  SELECT
    '' AS street,
    '' AS zipcode,
    '' AS city,
    id_customer
  FROM customers
  WHERE id_customer NOT IN (SELECT DISTINCT id_customer FROM customers_addresses)
) AS Y
ON Y.id_customer = customers.id_customer
WHERE Y.street LIKE '%Avenue%'
ORDER BY customers.name DESC
LIMIT 0, 20

此查询将性能提高了20秒。但是110秒仍然是不可接受的。

预先EXPLAIN EXTENDED

id   select_type         table               type           possible_keys      key      key_len ref         rows    filtered    Extra
1    PRIMARY             <derived2>          ALL            NULL               NULL     NULL    NULL        8596    100.00      Using temporary; Using filesort
1    PRIMARY             <derived5>          ALL            NULL               NULL     NULL    NULL        8604    100.00      Using join buffer
1    PRIMARY             customers           eq_ref         PRIMARY,name,name3 PRIMARY  4       Y.id_kunde  1       100.00      Using where
5    DERIVED             customers_addresses index          NULL               id_kunde 5       NULL        8651    100.00
6    UNION               customers           index          NULL               name2    767     NULL        8677    100.00      Using where; Using index
7    DEPENDENT SUBQUERY  customers_addresses index_subquery id_kunde           id_kunde 5       func        2       100.00      Using index
NULL UNION RESULT        <union5,6>          ALL            NULL               NULL     NULL    NULL        NULL    NULL
2    DERIVED             customers_contacts  index          NULL               id_kunde 4       NULL        10411   100.00
3    UNION               customers           index          NULL               name2    767     NULL        8677    100.00      Using where; Using index
4    DEPENDENT SUBQUERY  customers_contacts  index_subquery id_kunde           id_kunde 4       func        1       100.00      Using index
NULL UNION RESULT        <union2,3>          ALL            NULL               NULL     NULL    NULL        NULL    NULL

所以这是我的问题:如何改进其中一个查询和/或数据库表以获得超快响应?我不仅对解决方案感兴趣,而且对未来如何防止这种性能杀伤的策略感兴趣。

最好的问候。

1 个答案:

答案 0 :(得分:3)

作为一般规则,适用于此处,您可以说以下内容:

每当您使用连接选择结果的查询(子查询)时,MySQL必须首先运行这些子查询,然后从结果中创建一个表。你这样做了两次,这意味着MySQL首先创建了2个表,只是在结果完成后删除它们。通过适当的MySQL内存管理,这可以在内存中完成。但是这些表是在没有索引的情况下创建的,因为MySQL不能神奇地确定哪个索引最适合这些派生表,并且因为它们通常是在内存中创建的,所以对它们的查询非常快(不如使用键的SELECT那么快)。 / p>

然后,当两个表完全相同时,MySQL必须将原始表连接到两个表,并根据您的标准动态创建需要过滤和排序的第三个表。

这是一个性能杀手。您的要求之一是每个客户只能产生一条线。这不是数据库如何保存信息,因此您需要在运行时为数据转换付出代价(您的GROUP_CONCAT语句)。我不能100%确定当前MySQL数据库引擎对UNION语句的作用,所以我不想对它们发表评论。

在可用键上使用简单的INNER JOIN,但当结果有多个地址时,为客户产生多行,您会发现性能很快就会跳跃。如果您不愿意将所有客户的结果和所有相关地址拆分为该层上的客户,您可以轻松地迭代编程语言层中的客户,一次为一个客户请求地址。

TL; DR:放弃您的要求或承担管理费用。