group by和group concat,优化mysql查询而不使用main pk

时间:2017-10-01 07:51:58

标签: mysql database query-optimization

我的例子是 MYSQL VERSION是 34年5月6日日志

问题摘要以下查询需要 40秒 ORDER_ITEM

  • 758423 条记录

    付款

  • 177272 记录

submission_entry

  • 2165698 条记录

    作为整个表计数。

DETAILS HERE:BELOW:

  • 我有此查询,请参阅 [1]

  • 我已添加 SQL_NO_CACHE ,以便在重新测试时测试重复测试 查询。

  • 我有优化的索引参考 [2] ,但没有重要意义 改善。

  • 在此处查找表格结构 [3]

  • 查找使用 [4]
  • 的解释计划

[1]

     SELECT SQL_NO_CACHE
          `payment`.`id`                                                                                    AS id,
          `order_item`.`order_id`                                                                           AS order_id,


          GROUP_CONCAT(DISTINCT (CASE WHEN submission_entry.text = '' OR submission_entry.text IS NULL
            THEN ' '
                                 ELSE submission_entry.text END) ORDER BY question.var DESC SEPARATOR 0x1D) AS buyer,


          event.name                                                                                        AS event,
          COUNT(DISTINCT CASE WHEN (`order_item`.status > 0 OR (
            `order_item`.status != -1 AND `order_item`.status >= -2 AND `payment`.payment_type_id != 8 AND
            payment.make_order_free = 1))
            THEN `order_item`.id
                         ELSE NULL END)                                                                     AS qty,
          payment.currency                                                                                  AS `currency`,
          (SELECT SUM(order_item.sub_total)
           FROM order_item
           WHERE payment_id =
                 payment.id)                                                                                AS sub_total,
          CASE WHEN payment.make_order_free = 1
            THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
          ELSE ROUND(payment.total, 2) END                                                                  AS 'total',
          `payment_type`.`name`                                                                             AS payment_type,
          payment_status.name                                                                               AS status,
          `payment_status`.`id`                                                                             AS status_id,
          DATE_FORMAT(CONVERT_TZ(order_item.`created`, '+0:00', '-8:00'),
                      '%Y-%m-%d %H:%i')                                                                     AS 'created',
          `user`.`name`                                                                                     AS 'agent',
          event.id                                                                                          AS event_id,
          payment.checked,
          DATE_FORMAT(CONVERT_TZ(payment.checked_date, '+0:00', '-8:00'),
                      '%Y-%m-%d %H:%i')                                                                     AS checked_date,
          DATE_FORMAT(CONVERT_TZ(`payment`.`complete_date`, '+0:00', '-8:00'),
                      '%Y-%m-%d %H:%i')                                                                     AS `complete date`,
          `payment`.`delivery_status`                                                                       AS `delivered`
        FROM `order_item`
          INNER JOIN `payment`
            ON payment.id = `order_item`.`payment_id` AND (payment.status > 0.0 OR payment.status = -3.0)
          LEFT JOIN (SELECT
                       sum(`payment_refund`.total) AS `refunds_total`,
                       payment_refunds.payment_id  AS `payment_id`
                     FROM payment
                       INNER JOIN `payment_refunds` ON payment_refunds.payment_id = payment.id
                       INNER JOIN `payment` AS `payment_refund`
                         ON `payment_refund`.id = `payment_refunds`.payment_id_refund
                     GROUP BY `payment_refunds`.payment_id) AS `refunds` ON `refunds`.payment_id = payment.id
#           INNER JOIN event_date_product ON event_date_product.id = order_item.event_date_product_id
#           INNER JOIN event_date ON event_date.id = event_date_product.event_date_id
          INNER JOIN event ON event.id = order_item.event_id
          INNER JOIN payment_status ON payment_status.id = payment.status
          INNER JOIN payment_type ON payment_type.id = payment.payment_type_id
          LEFT JOIN user ON user.id = payment.completed_by
          LEFT JOIN submission_entry ON submission_entry.form_submission_id = `payment`.`form_submission_id`
          LEFT JOIN question ON question.id = submission_entry.question_id AND question.var IN ('name', 'email')
        WHERE 1 = '1' AND (order_item.status > 0.0 OR order_item.status = -2.0)
        GROUP BY `order_item`.`order_id`
        HAVING 1 = '1'
        ORDER BY `order_item`.`order_id` DESC
        LIMIT 10

[2]

 CREATE INDEX order_id
      ON order_item (order_id);

    CREATE INDEX payment_id
      ON order_item (payment_id);

    CREATE INDEX status
      ON order_item (status);

第二张表

CREATE INDEX payment_type_id
  ON payment (payment_type_id);

CREATE INDEX status
  ON payment (status);

[3]

CREATE TABLE order_item
(
  id                         INT AUTO_INCREMENT
    PRIMARY KEY,
  order_id                   INT                                 NOT NULL,
  form_submission_id         INT                                 NULL,
  status                     DOUBLE DEFAULT '0'                  NULL,
  payment_id                 INT DEFAULT '0'                     NULL
);

SECOND TABLE

CREATE TABLE payment
(
  id                 INT AUTO_INCREMENT,
  payment_type_id    INT                                 NOT NULL,
  status             DOUBLE                              NOT NULL,
  form_submission_id INT                                 NOT NULL,
  PRIMARY KEY (id, payment_type_id)
);

[4] 运行代码段以HTML格式查看 EXPLAIN 表格

<!DOCTYPE html>
<html>
<head>
  <title></title>
</head>
<body>
<table border="1" style="border-collapse:collapse">
<tr><th>id</th><th>select_type</th><th>table</th><th>type</th><th>possible_keys</th><th>key</th><th>key_len</th><th>ref</th><th>rows</th><th>Extra</th></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment_status</td><td>range</td><td>PRIMARY</td><td>PRIMARY</td><td>8</td><td>NULL</td><td>4</td><td>Using where; Using temporary; Using filesort</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment</td><td>ref</td><td>PRIMARY,payment_type_id,status</td><td>status</td><td>8</td><td>exp_live_18092017.payment_status.id</td><td>17357</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment_type</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment.payment_type_id</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>user</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment.completed_by</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>submission_entry</td><td>ref</td><td>form_submission_id,idx_submission_entry_1</td><td>form_submission_id</td><td>4</td><td>exp_live_18092017.payment.form_submission_id</td><td>2</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>question</td><td>eq_ref</td><td>PRIMARY,var</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.submission_entry.question_id</td><td>1</td><td>Using where</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>order_item</td><td>ref</td><td>status,payment_id</td><td>payment_id</td><td>5</td><td>exp_live_18092017.payment.id</td><td>3</td><td>Using where</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>event</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.order_item.event_id</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>&lt;derived3&gt;</td><td>ref</td><td>key0</td><td>key0</td><td>5</td><td>exp_live_18092017.payment.id</td><td>10</td><td>Using where</td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment_refunds</td><td>index</td><td>payment_id,payment_id_refund</td><td>payment_id</td><td>4</td><td>NULL</td><td>1110</td><td></td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment</td><td>ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment_refunds.payment_id</td><td>1</td><td>Using index</td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment_refund</td><td>ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment_refunds.payment_id_refund</td><td>1</td><td></td></tr>
<tr><td>2</td><td>DEPENDENT SUBQUERY</td><td>order_item</td><td>ref</td><td>payment_id</td><td>payment_id</td><td>5</td><td>func</td><td>3</td><td></td></tr></table>
</body>
</html>

预期的Restul

必须代替40秒,而不是5

重要更新

1)回复评论1:这两个表上根本没有外键。

UPDATE-1:本地上,原始查询需要 40秒 如果我删除 以下内容 25秒保存 15秒

GROUP_CONCAT(DISTINCT (CASE WHEN submission_entry.text = '' OR submission_entry.text IS NULL
    THEN ' '
                         ELSE submission_entry.text END) ORDER BY question.var DESC SEPARATOR 0x1D) AS buyer

如果我在 40秒

COUNT(DISTINCT CASE WHEN (`order_item`.status > 0 OR (
    `order_item`.status != -1 AND `order_item`.status >= -2 AND `payment`.payment_type_id != 8 AND
    payment.make_order_free = 1))
    THEN `order_item`.id
                 ELSE NULL END)                                                                     AS qty,

如果我删除 ,则需要 36秒保存 4秒

(SELECT SUM(order_item.sub_total)
   FROM order_item
   WHERE payment_id =
         payment.id)                                                                                AS sub_total,
  CASE WHEN payment.make_order_free = 1
    THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
  ELSE ROUND(payment.total, 2) END                                                                  AS 'total',

4 个答案:

答案 0 :(得分:2)

删除HAVING 1=1;优化程序可能不够智能而无法忽略它。请提供EXPLAIN SELECT not in html )以查看优化工具的功能。

在这种情况下,复合PK似乎是错误的:PRIMARY KEY (id, payment_type_id)。请说明理由。

请解释status的含义或DOUBLE的需要:status DOUBLE

需要花一些精力来弄清楚查询为何如此缓慢。让我们首先抛出标准化部分,例如日期和事件名称和货币。这可以减少查询到足以找到所需的行,但不是每行的详细信息。如果它仍然很慢,让我们调试一下。如果它是“快速”,则逐个添加其他内容,以找出导致性能问题的原因。

每张桌子只有id PRIMARY KEY吗?或者是否有更多例外(例如payment)?

question.var指定值似乎“错误”,但是后来使用LEFT暗示它是可选的。除非我在这个问题上有误,否则请将所有LEFT JOINs更改为INNER JOINs

是否有任何表(可能是submission_entryevent_date_product)“多对多”映射表?如果是这样,请按照提示here获得一些性能提升。

当您回来时,请为每张桌子提供SHOW CREATE TABLE

答案 1 :(得分:2)

遵循以下策略,

  • 预先评估临时表格上的agregations
  • payment置于顶部 - 因为这似乎是最具决定性的
  • 对联接进行分组 - 对查询优化器强制执行表关系

我提供了您的查询的修订版本:

-- -----------------------------------------------------------------------------
-- Summarization of order_item
-- -----------------------------------------------------------------------------

drop temporary table if exists _ord_itm_sub_tot;

create temporary table _ord_itm_sub_tot(
    primary key (payment_id)
)
SELECT
    payment_id,
    --
    COUNT(
        DISTINCT
            CASE
                WHEN(
                        `order_item`.status > 0 OR
                        (
                                `order_item`.status       != -1 AND
                                `order_item`.status       >= -2 AND
                                `payment`.payment_type_id != 8  AND
                                payment.make_order_free = 1
                            )
                    ) THEN `order_item`.id
                      ELSE NULL
            END
    ) AS qty,
    --
    SUM(order_item.sub_total) sub_total
FROM
    order_item
        inner join payment
        on payment.id = order_item.payment_id    
where order_item.status > 0.0 OR order_item.status = -2.0
group by payment_id;

-- -----------------------------------------------------------------------------
-- Summarization of payment_refunds
-- -----------------------------------------------------------------------------

drop temporary table if exists _pay_ref_tot;

create temporary table _pay_ref_tot(
    primary key(payment_id)
)
SELECT
    payment_refunds.payment_id  AS `payment_id`,
    sum(`payment_refund`.total) AS `refunds_total`
FROM
    `payment_refunds`
        INNER JOIN `payment` AS `payment_refund`
        ON `payment_refund`.id = `payment_refunds`.payment_id_refund
GROUP BY `payment_refunds`.payment_id;

-- -----------------------------------------------------------------------------
-- Summarization of submission_entry
-- -----------------------------------------------------------------------------

drop temporary table if exists _sub_ent;

create temporary table _sub_ent(
    primary key(form_submission_id)
)
select 
    submission_entry.form_submission_id,
    GROUP_CONCAT(
        DISTINCT (
            CASE WHEN coalesce(submission_entry.text, '') THEN ' '
                                                          ELSE submission_entry.text
            END
        )
        ORDER BY question.var
        DESC SEPARATOR 0x1D
    ) AS buyer
from 
    submission_entry
        LEFT JOIN question
        ON(
                question.id = submission_entry.question_id
            AND question.var IN ('name', 'email')
        )
group by submission_entry.form_submission_id;

-- -----------------------------------------------------------------------------
-- The result
-- -----------------------------------------------------------------------------

SELECT SQL_NO_CACHE
    `payment`.`id`          AS id,
    `order_item`.`order_id` AS order_id,
    --
    _sub_ent.buyer,
    --
    event.name AS event,
    --
    _ord_itm_sub_tot.qty,
    --
    payment.currency AS `currency`,
    --
    _ord_itm_sub_tot.sub_total,
    --
    CASE
        WHEN payment.make_order_free = 1 THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
                                         ELSE ROUND(payment.total, 2)
    END AS 'total',
    --
    `payment_type`.`name`   AS payment_type,
    `payment_status`.`name` AS status,
    `payment_status`.`id`   AS status_id,
    --
    DATE_FORMAT(
        CONVERT_TZ(order_item.`created`, '+0:00', '-8:00'),
        '%Y-%m-%d %H:%i'
    ) AS 'created',
    --
    `user`.`name` AS 'agent',
    event.id      AS event_id,
    payment.checked,
    --
    DATE_FORMAT(CONVERT_TZ(payment.checked_date,  '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS checked_date,
    DATE_FORMAT(CONVERT_TZ(payment.complete_date, '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS `complete date`,
    --
    `payment`.`delivery_status` AS `delivered`
FROM
    `payment`
        INNER JOIN(
            `order_item`
                INNER JOIN event
                ON event.id = order_item.event_id
        )
        ON `order_item`.`payment_id` = payment.id
        --
        inner join _ord_itm_sub_tot
        on _ord_itm_sub_tot.payment_id = payment.id
        --
        LEFT JOIN _pay_ref_tot
        on _pay_ref_tot.payment_id = `payment`.id
        --
        INNER JOIN payment_status ON payment_status.id = payment.status
        INNER JOIN payment_type   ON payment_type.id   = payment.payment_type_id
        LEFT  JOIN user           ON user.id           = payment.completed_by
        --
        LEFT JOIN _sub_ent
        on _sub_ent.form_submission_id = `payment`.`form_submission_id`
WHERE
    1 = 1
AND (payment.status > 0.0 OR payment.status = -3.0)
AND (order_item.status > 0.0 OR order_item.status = -2.0)
ORDER BY `order_item`.`order_id` DESC
LIMIT 10

来自你问题的查询显示了没有明确分组的聚合函数......这非常尴尬,在我的解决方案中,我尝试设计“有意义”的聚合。

请运行此版本并告诉我们您的发现。

请不要只关注正在运行的统计信息,还要注意汇总结果。

答案 2 :(得分:0)

(表格和查询对我来说太复杂了,无法为你做转换。但这里是步骤。)

  1. 在不提及refunds的情况下重新制定查询。也就是说,删除派生表并在复杂CASE中提及它。
  2. 调试并计算生成的查询。保留GROUP BY order_item ORDER BY order_item DESC LIMIT 10并执行已建议的任何其他优化。特别是,摆脱HAVING 1=1,因为它可能是一种可能的优化方式。
  3. 使步骤#2中的查询成为“派生表”......
  4. 类似的东西:

    SELECT lots of stuff
        FROM ( query from step 2 ) AS step2
        LEFT JOIN ( ... ) AS refunds  ON step2... = refunds...
        ORDER BY step2.order_item DESC
    

    重复ORDER BY,但GROUP BYLIMIT都不需要重复。

    为什么呢?这里的原则是......

    目前,它会进入refunds相关子查询数千次,但只能丢掉10次。重新制定只能将其减少到10次。

    (警告:我可能已经错过了一个微妙的事情,阻止了这个重新制定工作,因为我提出它。如果它不起作用,看看你是否可以让'原则'帮助你。)

答案 3 :(得分:0)

以下是每次看到包含大量连接和分页的查询时应该执行的最小值:您应该从第一个表(order_item)中选择那些以最小连接数分组的10个(LIMIT 10)ID可能然后将id连接回第一个表并进行所有其他连接。这样,您就不会在临时表中移动所有数千个不需要显示的列和行。

  1. 您可以查看内部联接和WHERE条件,GROUP BY和ORDER BY,以查看是否需要任何其他表来从第一个表中筛选出行,组或订单ID。在您的情况下,除了payment之外,您似乎不需要任何联接。

  2. 现在编写查询以选择这些ID:

    SELECT o.order_id, o.payment_id
    FROM order_item o
    JOIN payment p
        ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
    WHERE order_item.status > 0.0 OR order_item.status = -2.0
    ORDER BY order_id DESC
    LIMIT 10
    

    如果单个订单可能有多笔付款,则应使用GROUP BY order_id DESC代替ORDER BY。为了使查询更快地运行,您需要在status列的order_item列上使用BTREE索引,甚至在(status, payment_id)上使用复合索引。

  3. 现在,当您确定ID是您期望的那些时,您可以进行所有其他连接:

    SELECT order_item.order_id,
      `payment`.`id`,
      GROUP_CONCAT ... -- and so on from the original query
    FROM (
      SELECT o.order_id, o.payment_id
      FROM order_item o
      JOIN payment p
        ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
      WHERE order_item.status > 0.0 OR order_item.status = -2.0
      ORDER BY order_id DESC
      LIMIT 10
    ) as ids
    JOIN order_item ON ids.order_id = order_item.order_id
    JOIN payment ON ids.payment_id = payment.id
    LEFT JOIN ( ... -- and so on
    
  4. 这个想法是你大大降低了你需要处理的临时表。现在,连接选择的每一行都将用在结果集中。

    UPD1:另一件事是你应该简化LEFT JOIN中的聚合:

    SELECT
      sum(payment.total) AS `refunds_total`,
      refs.payment_id  AS `payment_id`
    FROM payment_refunds refs
    JOIN payment ON payment.id = refs.payment_id_refund
    GROUP BY refs.payment_id
    

    甚至用相关的子查询替换LEFT JOIN,因为只对那10行执行相关(确保,你使用这三个列的整个查询作为子查询,否则,将为每个子列计算相关性在GROUP BY之前的结果连接中的行:

    SELECT
          ids.order_id,
          ids.payment_id,
          (SELECT SUM(p.total) 
           FROM payment_refunds refs 
           JOIN payment p 
             ON refs.payment_id_refund = p.id
           WHERE refs.payment_id = ids.payment_id
           ) as refunds_total
        FROM (
          SELECT o.order_id, o.payment_id
          FROM order_item o
          JOIN payment p
            ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
          WHERE order_item.status > 0.0 OR order_item.status = -2.0
          ORDER BY order_id DESC
          LIMIT 10
        ) as ids
    

    您还需要(payment_id, payment_id_refund)上的索引payment_refunds,您甚至可以在付款时尝试覆盖索引(payment_id, total)