为什么MySQL不使用JOIN上的主键加上ORDER?

时间:2012-11-01 14:35:17

标签: mysql indexing sql-order-by left-join

这是一个适合你的(MySQL显然):

# Setting things up
DROP DATABASE IF EXISTS index_test_gutza;
CREATE DATABASE index_test_gutza;
USE index_test_gutza;

CREATE TABLE customer_order (
    id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
    invoice MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
    PRIMARY KEY(id)
);
INSERT INTO customer_order
    (id, invoice)
    VALUES
    (1, 1),
    (2, 2),
    (3, 3),
    (4, 4),
    (5, 5);

CREATE TABLE customer_invoice (
    id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
    invoice_no MEDIUMINT UNSIGNED DEFAULT NULL,
    invoice_pdf LONGBLOB,
    PRIMARY KEY(id)
);
INSERT INTO customer_invoice
    (id, invoice_no)
    VALUES
    (1, 1),
    (2, 2),
    (3, 3),
    (4, 4),
    (5, 5);

# Ok, here's the beef
EXPLAIN
    SELECT co.id
    FROM customer_order AS co;

EXPLAIN
    SELECT co.id
    FROM customer_order AS co
    ORDER BY co.id;

EXPLAIN
    SELECT co.id, ci.invoice_no
    FROM customer_order AS co
    LEFT JOIN customer_invoice AS ci ON ci.id=co.invoice;

EXPLAIN
    SELECT co.id, ci.invoice_no
    FROM customer_order AS co
    LEFT JOIN customer_invoice AS ci ON ci.id=co.invoice
    ORDER BY co.id;

底部有四个EXPLAIN语句。前两个产生了你期望的结果:

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | co    | index | NULL          | PRIMARY | 3       | NULL |    5 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

第三个已经很有趣了 - 请注意customer_order中的主键是如何不再使用的:

+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                         | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+
|  1 | SIMPLE      | co    | ALL    | NULL          | NULL    | NULL    | NULL                        |    5 |             |
|  1 | SIMPLE      | ci    | eq_ref | PRIMARY       | PRIMARY | 3       | index_test_gutza.co.invoice |    1 | Using index |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+

然而,第四个是zinger - 只需在主键上添加ORDER BY 就会导致customer_order上的filesort(考虑到它已经在上面被迷惑了,这是预期的) :

+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                         | rows | Extra          |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+
|  1 | SIMPLE      | co    | ALL    | NULL          | NULL    | NULL    | NULL                        |    5 | Using filesort |
|  1 | SIMPLE      | ci    | eq_ref | PRIMARY       | PRIMARY | 3       | index_test_gutza.co.invoice |    1 | Using index    |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+

文件排序!而且我从不使用除customer_order表中的主键以外的任何内容进行排序,以及在customer_invoice表中使用JOIN的主键。那么,在所有好的和正确的名义中,为什么它会突然切换到filesort?!更重要的是,我该如何避免这种情况?为了记录,我很乐意接受一个记录在案的答案,解释为什么不能可以避免(如果是这样的话。)

正如您现在可能怀疑的那样,这实际上是在生产中发生的,虽然表格绝不是很大(只有数百条记录),但发票表上的文件排序(包含PDF文件)正在查杀服务器当我运行类似于上面的查询时(我需要知道哪些订单已经发出发票,哪些不是)。

在你问之前,我设计了数据库,我认为我可以安全地将PDF文件存储在该表中,因为我永远不会需要任何搜索查询 - 我总是有它的主键在手边!

更新(评论摘要)

以下是以下评论中建议的概要,因此您无需阅读所有内容:

  • *你应该在customer_order.invoice上添加一个密钥* - 我实际上是在生产中尝试过,它没有区别(因为它不应该)
  • 您应该使用USE INDEX - 尝试过,不起作用。我也试过FORCE INDEX - 没有结果(没有任何改变)
  • 您过度简化了用例,我们需要实际的生产查询 - 我可能在第一次迭代中将它剥离了太多,所以我更新了它(我刚刚添加了{{1}在最后几个查询的, ci.invoice_no中。为了记录,如果有人真的很好奇,这里是生产查询,完全一样(这将检索订单的最后一页):
SELECT
    corder.id,
    corder.public_id,
    CONCAT(buyer.fname," ",buyer.lname) AS buyer_name,
    corder.status,
    corder.payment,
    corder.reserved AS R,
    corder.tracking_id!="" as A,
    corder.payment_received as pay_date,
    invoice.invoice_no AS inv,
    invoice.receipt_no AS rec,
    invoice.public AS pub_inv,
    proforma.proforma_no AS prof,
    proforma.public AS pub_pf,
    corder.rating,
    corder.rating_comments!="" AS got_comment
FROM
    corder
LEFT JOIN user as buyer ON buyer.id=corder.buyer
LEFT JOIN invoice as invoice ON invoice.id=corder.invoice
LEFT JOIN invoice as proforma ON proforma.id=corder.proforma
ORDER BY
    id DESC 
LIMIT 400, 20;

上面的查询(这也是我在生产中运行的那个)大约需要14秒才能运行。这是在生产中执行的简化查询,如上面的用例所示:

SELECT
    corder.id,
    invoice.invoice_no
FROM
    corder
LEFT JOIN invoice ON invoice.id=corder.invoice
ORDER BY
    corder.id DESC 
LIMIT 400, 20;

这个需要13秒才能运行。请注意,只要我们谈论结果的最后一页(我们是),LIMIT就没有任何区别。也就是说,在涉及filesort时,检索最后12个结果或所有412个结果之间绝对没有显着差异。

结论

ypercube的答案不仅正确,而且不幸的是它似乎是唯一合法的答案。我试图进一步将条件与字段分开,因为SELECT子查询最终可能涉及大量数据,如果corder本身包含LONGBLOB(并且在子查询中复制主查询中的字段是不优雅的),但不幸的是它似乎不起作用:

SELECT
    corder.id,
    corder.public_id,
    CONCAT(buyer.fname," ",buyer.lname) AS buyer_name,
    corder.status,
    corder.payment,
    corder.reserved AS R,
    corder.tracking_id != "" AS A,
    corder.payment_received AS pay_date,
    invoice.invoice_no AS inv,
    invoice.receipt_no AS rec,
    invoice.public AS pub_inv,
    proforma.proforma_no AS prof,
    proforma.public AS pub_pf,
    corder.rating,
    corder.rating_comments!="" AS got_comment
FROM
    corder
LEFT JOIN user as buyer ON buyer.id = corder.buyer
LEFT JOIN invoice AS invoice ON invoice.id = corder.invoice
LEFT JOIN invoice AS proforma ON proforma.id = corder.proforma
WHERE corder.id IN (
    SELECT id
    FROM corder
    ORDER BY id DESC
    LIMIT 400,20
)
ORDER BY
    corder.id DESC;

此操作失败,并显示以下错误消息:

ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

我正在使用MySQL 5.1.61,这在5.1家族中是合理的近期(显然5.5.x中也不支持)。

1 个答案:

答案 0 :(得分:4)

你能试试这个版本吗(它基本上先获得corder表的420行,保留其中的20行,然后进行3个外连接:

SELECT
    corder.id,
    corder.public_id,
    CONCAT(buyer.fname," ",buyer.lname) AS buyer_name,
    corder.status,
    corder.payment,
    corder.reserved AS R,
    corder.tracking_id != "" AS A,
    corder.payment_received AS pay_date,
    invoice.invoice_no AS inv,
    invoice.receipt_no AS rec,
    invoice.public AS pub_inv,
    proforma.proforma_no AS prof,
    proforma.public AS pub_pf,
    corder.rating,
    corder.rating_comments!="" AS got_comment
FROM
    ( SELECT * 
      FROM corder
      ORDER BY
        id DESC 
      LIMIT 400, 20
    )
    AS corder
LEFT JOIN user as buyer ON buyer.id = corder.buyer
LEFT JOIN invoice AS invoice ON invoice.id = corder.invoice
LEFT JOIN invoice AS proforma ON proforma.id = corder.proforma
ORDER BY
    corder.id DESC ;