Question

我有一个如下查询：

Select 
sum(r.impressions) as impressions from keyword_report r 
where r.org_id = 1
and r.report_date between '2019-09-01' and '2019-09-10'
group by r.country, r.keyword_id;

我在keyword_report上有2个索引；

index1: (org_id, report_date)
index2: (country, keyword_id)

解释格式= json结果：

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "138210.60"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": false,
      "table": {
        "table_name": "r",
        "access_type": "ref",
        "possible_keys": [
          "index1",
          "index2"
        ],
        "key": "index1",
        "used_key_parts": [
          "org_id",
          "report_date"
        ],
        "key_length": "11",
        "ref": [
          "const",
          "const"
        ],
        "rows_examined_per_scan": 125646,
        "rows_produced_per_join": 125646,
        "filtered": "100.00",
        "index_condition": "(`r`.`report_date` between '2019-09-01' and '2019-09-10')",
        "cost_info": {
          "read_cost": "125646.00",
          "eval_cost": "12564.60",
          "prefix_cost": "138210.60",
          "data_read_per_join": "162M"
        },
        "used_columns": [
          "org_id",
          "keyword_id",
          "impressions",
          "report_date",
          "country"
        ]
      }
    }
  }
}

表中大约有；

1000个不同的org_id，
500个不同的报告日期，
30个不同的国家，
1000万keyword_id。

在这里我无法理解两件事。

为什么使用临时？
为什么多个索引不起作用？

结果，我该如何改善？

Answer 1

以下JSON输出似乎表明您正在使用index1索引：

"used_key_parts": [
  "org_id",
  "report_date"
]

可以在WHERE子句中使用此索引来过滤不匹配的记录。此后，MySQL仍必须执行GROUP BY聚合，其中包括impressions列上的总和。请注意，索引对于聚合实际上并没有多大帮助，因为根据定义，数据库必须接触每组中的每条记录以计算总和。尽管大多数情况下数据库甚至不会选择在同一张表上使用两个不同的索引（但有可能），但是在这种情况下，由于您的性质，第二个index2索引在这里没有多大帮助聚集。

给出一个示例，其中可以使用单个索引覆盖查询的所有步骤，请考虑以下事项：

SELECT
    r.country,
    MAX(r.impressions) AS max_impressions
FROM keyword_report r 
WHERE
    r.org_id = 1 AND
    r.report_date BETWEEN '2019-09-10' AND '2019-09-10'
GROUP BY
    r.country;

现在，如果您定义了以下索引：

(org_id, report_date, country, impressions)

然后MySQL可能会选择使用它。之所以可行，是因为在过滤掉WHERE子句中的记录之后，很容易找到每个国家的impressions的最大值。

Answer 2

为什么多个索引不起作用？

MySQL很少一次使用多个索引，除非在使用index_merge条件等情况下有OR的可能性，在这种情况下，第一优先级是WHERE条件，这就是为什么它使用index1的原因，因为它可以通过使用索引精确指向要查看的特定行来减少数据查找。同样，从逻辑上讲，WHERE发生在GROUP BY之前，这还有助于减少要汇总的数据（而不是通过完整的表）。

另外，没有其他索引建议会起作用，因为不幸的是，MySQL会在report_date遇到范围条件时停止。

为什么使用临时模式？

因为您正在使用GROUP BY-查询将首先将所有数据提取到一个临时表中（同样，您的索引没有覆盖），并且一旦完成该过程，它将相应地执行聚合。 / p>

in MySQL documentation也作了解释：

使用临时（JSON属性：using_temporary_table）

要解决该查询，MySQL需要创建一个临时表来保存   结果。如果查询包含GROUP BY和，通常会发生这种情况   ORDER BY子句以不同的方式列出列。

Answer 3

对于此查询：

select sum(r.impressions) as impressions, r.country, r.keyword_id
from keyword_report r 
where r.org_id = 1 and
      r.report_date between '2019-09-01' and '2019-09-10'
group by r.country, r.keyword_id;

将仅使用一个索引。您可以尝试在keyword_report(org_id, report_date, country, keyword_id, impressions)上建立索引。这涵盖了查询，这意味着可以使用所有列。但是，仍然需要排序。

在查询的原始版本中，BETWEEN的两个操作数具有相同的值。我认为MySQL不够聪明，无法识别两个操作数相同，因此它等效于=。在这种情况下，您应该将查询的短语设置为：

select sum(r.impressions) as impressions, r.country, r.keyword_id
from keyword_report r 
where r.org_id = 1 and
      r.report_date = '2019-09-10' 
group by r.country, r.keyword_id;

然后，MySQL 可能使用GROUP BY的索引-MySQL对GROUP BY使用索引可能有些挑剔。

对于该版本的索引使用情况，我会更有信心：

select ck.*,
       (select sum(impressions)
        from keyword_report r2
        where r2.country = r.country and
              r2.keyword_id = r.keyword_id and
              r2.report_date = r.report_date
       ) as total_impressions
from (select distinct country, keyword_id
      from keyword_report r 
      where r.org_id = 1 and
            r.report_date = '2019-09-10' 
     ) ck;

这将使用相同的索引。

但是，您不能以这种方式将查询改写为实际范围。

Mysql多重索引不适用于单个查询（Group By + Range Where条件）

3 个答案: