通过修改的行自行加入和过滤的高效方法

时间:2017-08-01 15:52:41

标签: mysql sql

我正在尝试选择此表中的所有行,并且选择了修改后的id而不是原始ID的约束。因此,如果某行具有修订版本,则选择该修订版本而不是该行,如果有多个修订版本编号,则首选修订版本号。

我认为示例表,输出和查询将更好地解释这一点:

表格

+----+-------+-------------+-----------------+-------------+
| id | value | original_id | revision_number | is_revision |
+----+-------+-------------+-----------------+-------------+
|  1 | abcd  | null        | null            |           0 |
|  2 | zxcv  | null        | null            |           0 |
|  3 | qwert | null        | null            |           0 |
|  4 | abd   | 1           | 1               |           1 |
|  5 | abcde | 1           | 2               |           1 |
|  6 | zxcvb | 2           | 1               |           1 |
|  7 | poiu  | null        | null            |           0 |
+----+-------+-------------+-----------------+-------------+

期望输出:

+----+-------+-------------+-----------------+
| id | value | original_id | revision_number |
+----+-------+-------------+-----------------+
|  3 | qwert | null        | null            |
|  5 | abcde | 1           | 2               |
|  6 | zxcvb | 2           | 1               |
|  7 | poiu  | null        | null            |
+----+-------+-------------+-----------------+

查看被叫revisions_max

SELECT 
    responses.original_id AS original_id,
    MAX(responses.revision_number) AS revision
FROM
    responses
 WHERE
    original_id IS NOT NULL   
GROUP BY responses.original_id

我当前的查询:

SELECT
    responses.*
FROM
    responses
WHERE
    id NOT IN (
        SELECT
            original_id
        FROM
            revisions_max
    )
AND
    is_revision = 0

UNION

SELECT
    responses.*
FROM
    responses
INNER JOIN revisions_max ON revisions_max.original_id = responses.original_id
    AND revisions_max.revision_number = responses.revision_number

此查询有效,但需要0.06秒才能运行。只有2000行的表。该表将快速开始扩展到数十或数十万行。 union下的查询占用了大部分时间。

我可以做些什么来改善此查询效果?

2 个答案:

答案 0 :(得分:1)

我将与其他任何DBMS一起使用的方法是使用NOT EXISTS

SELECT  r1.*
FROM    Responses AS r1
WHERE   NOT EXISTS
        (   SELECT  1
            FROM    Responses AS r2
            WHERE   r2.original_id = COALESCE(r1.original_id, r1.id)
            AND     r2.revision_number > COALESCE(r1.revision_number, 0)
        );

删除存在相同ID的较高版本号的任何行(如果已填充,则删除original_id)。但是,在MySQL中,LEFT JOIN/IS NULL will perform better than NOT EXISTS 1 。因此我会将上述内容重写为:

SELECT  r1.*
FROM    Responses AS r1
        LEFT JOIN Responses AS r2
            ON r2.original_id = COALESCE(r1.original_id, r1.id)
            AND r2.revision_number > COALESCE(r1.revision_number, 0)
WHERE   r2.id IS NULL;

<强> Example on DBFiddle

我意识到你说过你不想使用LEFT JOIN并检查空值,但我不知道有更好的解决方案。

1。至少在历史上就是这种情况,我没有积极使用MySQL,所以不要及时了解优化器的发展情况

答案 1 :(得分:1)

如何使用coalesce()

SELECT COALESCE(y.id, x.id)                           AS id,
       COALESCE(y.value, x.value)                     AS value,
       COALESCE(y.original_id, x.original_id)         AS original_id,
       COALESCE(y.revision_number, x.revision_number) AS revision_number
FROM   responses x
       LEFT JOIN (SELECT r1.*
                  FROM   responses r1
                         INNER JOIN (SELECT responses.original_id          AS
                                            original_id,
                                            Max(responses.revision_number) AS
                                            revision
                                     FROM   responses
                                     WHERE  original_id IS NOT NULL
                                     GROUP  BY responses.original_id) rev
                                 ON r1.original_id = rev.original_id
                                    AND r1.revision_number = rev.revision) y
              ON x.id = y.original_id
WHERE  y.id IS NOT NULL
        OR x.original_id IS NULL;