MySQL / memSQL在BETWEEN连接条件

时间:2017-04-03 09:41:26

标签: mysql query-optimization sqlperformance memsql

我们有两张桌子:

  • 一个dates表,其中包含过去10年和未来10年中每天的一个日期。
  • 包含以下列的states表:start_dateend_datestate

我们运行的查询如下所示:

SELECT dates.date, COUNT(*)
FROM dates
JOIN states
ON dates.date BETWEEN states.start_date AND states.end_date
WHERE dates.date BETWEEN '2017-01-01' AND '2017-01-31'
GROUP BY dates.date
ORDER BY dates.date;

根据查询计划,memSQL没有在JOIN条件上使用索引,这使查询变慢。有没有办法在JOIN条件下使用索引?

我们在dates.date,states.start_date,states.end_date,(states.start_date,states.end_date)上尝试了memSQL skiplist索引

表&说明:

CREATE TABLE `dates` (
  `date` date DEFAULT NULL,
  KEY `date_index` (`date`)
)

CREATE TABLE `states` (
  `start_date` datetime DEFAULT NULL,
  `end_date` datetime DEFAULT NULL,
  `state` varchar(256) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  KEY `start_date` (`start_date`),
  KEY `end_date` (`end_date`),
  KEY `start_date_end_date` (`start_date`,`end_date`),
)

+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| GatherMerge [remote_0.date] partitions:all est_rows:96 alias:remote_0                                                                               |
| Project [r2.date, CAST(COALESCE($0,0) AS SIGNED) AS `COUNT(*)`] est_rows:96                                                                         |
| Sort [r2.date]                                                                                                                                      |
| HashGroupBy [SUM(r2.`COUNT(*)`) AS $0] groups:[r2.date]                                                                                             |
| TableScan r2 storage:list stream:no                                                                                                                 |
| Repartition [r1.date, `COUNT(*)`] AS r2 shard_key:[date] est_rows:96 est_select_cost:26764032                                                       |
| HashGroupBy [COUNT(*) AS `COUNT(*)`] groups:[r1.date]                                                                                               |
| Filter [r1.date <= states.end_date]                                                                                                                 |
| NestedLoopJoin                                                                                                                                      |
| |---IndexRangeScan drstates_test.states, KEY start_date (start_date) scan:[start_date <= r1.date] est_table_rows:123904 est_filtered:123904         |
| TableScan r1 storage:list stream:no                                                                                                                 |
| Broadcast [dates.date] AS r1 distribution:tree est_rows:96                                                                                          |
| IndexRangeScan drstates_test.dates, KEY date_index (date) scan:[date >= '2017-01-01' AND date <= '2017-01-31'] est_table_rows:18628 est_filtered:96 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+

1 个答案:

答案 0 :(得分:0)

ON dates.date BETWEEN states.start_date
                  AND states.end_date

基本上是不可优化的。执行此测试的唯一实用方法是对每一行进行繁琐的测试。

如果您使用的是MySQL并且不需要dates表,请考虑从

开始
SELECT  *
    FROM  states
    WHERE  start_date >= '2017-01-01'
      AND  end_date    < '2017-01-01' + INTERVAL 1 MONTH 

请注意,这适用于DATEDATETIME数据类型的任意组合。

由于我不清楚最终目标,我不清楚下一步该做什么。