索引

Question

我有以下查询：

setInterval

我有以下键和索引：

table_1.id主键。
在table_1.col_condition_1上的索引
在table_1.col_condition_2上的索引
表_1.col_condition_1和表_1.col_condition_2上的复合索引

获取正确的索引。查询说明：

// ... OP's implementation code including `autoInterval`,
// `auto`, and `next` goes above here ...

const r = useRef(null);
r.current = { next };
useEffect(
  () => {
    if (!auto) return;
    const id = setInterval(() => {
      r.current.next();
    }, autoInterval);
    return () => {
      clearInterval(id);
    };
  },
  [auto],
);

表_1拥有约60 MM记录，表_2具有约4 MM记录。

查询需要60秒才能返回结果。

有趣的是：

SELECT table_1.id

FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)

WHERE
table_1.col_condition_1 = 0
AND table_1.col_condition_2 NOT IN (3, 4)
AND (table_2.id is NULL OR table_1.date_col > table_2.date_col)

LIMIT 5000;

花费145毫秒返回结果，并选择与第一个查询相同的索引。

+--+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------------+---------+------------+----------+-----------------------+--+
|  | id | select_type |  table  |  type  |                            possible_keys                            |          key          | key_len |    ref     |   rows   |         Extra         |  |
+--+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------------+---------+------------+----------+-----------------------+--+
|  |  1 | SIMPLE      | table_1 | range  | "the composite index", col_condition_1 index ,col_condition_2 index | "the composite index" |       7 |            | 11819433 | Using index condition |  |
|  |  1 | SIMPLE      | table_2 | eq_ref | PRIMARY,id_UNIQUE                                                   | PRIMARY               |       8 | table_1.id |        1 | Using where           |  |
+--+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------------+---------+------------+----------+-----------------------+--+

需要174毫秒才能返回结果。

查询说明：

SELECT table_1.id

FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)

WHERE
table_1.col_condition_1 = 0
AND table_1.col_condition_2 NOT IN (3, 4)

LIMIT 5000;

和

SELECT table_1.id

FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)

WHERE
table_1.col_condition_1 = 0
AND (table_2.id is NULL OR table_1.date_col > table_2.date_col)

LIMIT 5000;

大约需要1秒才能返回结果。

查询说明：

+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-------------+
| id | select_type |  table  |  type  |                            possible_keys                            |       key       | key_len |    ref     |   rows   |    Extra    |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-------------+
|  1 | SIMPLE      | table_1 | ref    | "the composite index", col_condition_1 index ,col_condition_2 index | col_condition_1 |       2 | const      | 30381842 | NULL        |
|  1 | SIMPLE      | table_2 | eq_ref | PRIMARY,id_UNIQUE                                                   | PRIMARY         |       8 | table_1.id |        1 | Using where |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-------------+

另外，当我分别使用每个where条件时，查询会在约100毫秒内返回结果。

我的问题是，即使同时使用正确的索引并使用任意两个条件执行查询，为什么在同时使用三个where条件时查询要花大量时间（60秒）才能返回结果三个条件也可以在更短的时间内返回结果。

还有，有没有一种方法可以优化此查询？

谢谢。

编辑：

创建表：

表_1：

SELECT table_1.id

FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)

WHERE
table_1.col_condition_2 NOT IN (3, 4)
AND (table_2.id is NULL OR table_1.date_col > table_2.date_col)

LIMIT 5000;

表_2：

+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-----------------------+
| id | select_type |  table  |  type  |                            possible_keys                            |       key       | key_len |    ref     |   rows   |         Extra         |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-----------------------+
|  1 | SIMPLE      | table_1 | range  | "the composite index", col_condition_1 index ,col_condition_2 index | col_condition_2 |       5 |            | 36254294 | Using index condition |
|  1 | SIMPLE      | table_2 | eq_ref | PRIMARY,id_UNIQUE                                                   | PRIMARY         |       8 | table_1.id |        1 | Using where           |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-----------------------+

Answer 1

尝试将现有的SQL分为两部分，并查看每个部分的执行时间。希望这会给您造成缓慢的原因：

第1部分：

SELECT table_1.id
  FROM table_1
  LEFT JOIN table_2
    ON (table_1.id = table_2.id)
 WHERE table_1.col_condition_1 = 0
   AND table_1.col_condition_2 NOT IN (3, 4)
   AND table_2.id is NULL

和第2部分（在此处注意内部联接）：

SELECT table_1.id
  FROM table_1
  JOIN table_2
    ON (table_1.id = table_2.id)
 WHERE table_1.col_condition_1 = 0
   AND table_1.col_condition_2 NOT IN (3, 4)
   AND table_1.date_col > table_2.date_col

我希望第二部分将花费更长的时间。在这种情况下，我认为对date_1coll的table_1和table_2都进行索引会有所帮助。

我认为综合索引对您的选择完全没有帮助。

这表示很难诊断为什么这三个条件一起会严重影响性能。它似乎与您的数据分布有关。不确定mySql，但在Oracle中，对这些表的统计信息收集会有所作为。

希望有帮助。

Answer 2

类似的问题往往需要尝试并进行测试，以查看其效果如何。

因此，从此开始：

SELECT
table_1.id
FROM
table_1
LEFT JOIN table_2
ON table_1.id = table_2.id
AND table_1.date_col <= table_2.date_col
WHERE
table_1.col_condition_1 = 0
AND table_1.col_condition_2 NOT IN (3, 4)
AND table_2.id is NULL

LIMIT 5000;

这与您的查询等效的逻辑推理：原始查询的(table_2.id is NULL OR table_1.date_col > table_2.date_col)的WHERE语句可以概括为“仅包括不具有table_2记录或者table_2记录早于（或等于）table_1记录的table_1记录。

我的查询版本使用反联接排除所有存在于table_1记录之前（或等于table_1记录）的table_2的所有table_1记录。

索引

有许多可能的组合索引可以帮助此查询。这里有几个开始：

对于表_2：(id,date_col)

对于表_1：(col_condition_1,id,date_col,col_condition_2)

请尝试我的查询和索引，并报告结果（包括EXPLAIN计划）。

Answer 3

OR是性能杀手。
有时使用UNION而不是OR可以加快查询速度。
在一种情况下，也许5000在合并表的“开头附近”，而在另一种情况下则不在。
在没有LIMIT的情况下使用ORDER BY是可疑的。
由于PK是唯一键，因此也声明id_UNIQUE是多余的。

INDEX(a)

INDEX(a,b)。
如果只有4个值，IN (1, 2) 可能比NOT IN (3, 4)快。
让两个表共享相同的PK是不寻常的。你为什么有一对一的关系？
如果我们能看到真实的列名，我们可能会有进一步的了解。

选择条件较慢的三个查询，但条件三个较快的三个查询的任意组合的同一查询

3 个答案:

索引