我们有两张桌子:
dates
表,其中包含过去10年和未来10年中每天的一个日期。states
表:start_date
,end_date
,state
。我们运行的查询如下所示:
SELECT dates.date, COUNT(*)
FROM dates
JOIN states
ON dates.date BETWEEN states.start_date AND states.end_date
WHERE dates.date BETWEEN '2017-01-01' AND '2017-01-31'
GROUP BY dates.date
ORDER BY dates.date;
根据查询计划,memSQL没有在JOIN条件上使用索引,这使查询变慢。有没有办法在JOIN条件下使用索引?
我们在dates.date,states.start_date,states.end_date,(states.start_date,states.end_date)上尝试了memSQL skiplist索引
表&说明:
CREATE TABLE `dates` (
`date` date DEFAULT NULL,
KEY `date_index` (`date`)
)
CREATE TABLE `states` (
`start_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
`state` varchar(256) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
KEY `start_date` (`start_date`),
KEY `end_date` (`end_date`),
KEY `start_date_end_date` (`start_date`,`end_date`),
)
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| GatherMerge [remote_0.date] partitions:all est_rows:96 alias:remote_0 |
| Project [r2.date, CAST(COALESCE($0,0) AS SIGNED) AS `COUNT(*)`] est_rows:96 |
| Sort [r2.date] |
| HashGroupBy [SUM(r2.`COUNT(*)`) AS $0] groups:[r2.date] |
| TableScan r2 storage:list stream:no |
| Repartition [r1.date, `COUNT(*)`] AS r2 shard_key:[date] est_rows:96 est_select_cost:26764032 |
| HashGroupBy [COUNT(*) AS `COUNT(*)`] groups:[r1.date] |
| Filter [r1.date <= states.end_date] |
| NestedLoopJoin |
| |---IndexRangeScan drstates_test.states, KEY start_date (start_date) scan:[start_date <= r1.date] est_table_rows:123904 est_filtered:123904 |
| TableScan r1 storage:list stream:no |
| Broadcast [dates.date] AS r1 distribution:tree est_rows:96 |
| IndexRangeScan drstates_test.dates, KEY date_index (date) scan:[date >= '2017-01-01' AND date <= '2017-01-31'] est_table_rows:18628 est_filtered:96 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
答案 0 :(得分:0)
ON dates.date BETWEEN states.start_date
AND states.end_date
基本上是不可优化的。执行此测试的唯一实用方法是对每一行进行繁琐的测试。
如果您使用的是MySQL并且不需要dates
表,请考虑从
SELECT *
FROM states
WHERE start_date >= '2017-01-01'
AND end_date < '2017-01-01' + INTERVAL 1 MONTH
请注意,这适用于DATE
和DATETIME
数据类型的任意组合。
由于我不清楚最终目标,我不清楚下一步该做什么。