我不知道是否可能,但是我想优化我编写的查询,以获取两个日期之间的所有查询组,其中两个日期之间都填充有0个值,用于间隔中的缺失日期。
我正在运行MySQL 5.7
我有一个calendar
表,其中包含一年中的所有小时(8760行)
CREATE TABLE calendar (
date datetime PRIMARY KEY
);
我有一个spents
表,其中包含日期,用户,类别和花费的
CREATE TABLE spents (
date datetime NOT NULL,
user varchar(24) NOT NULL,
category enum('food', 'hobbies', 'clothing', 'taxes') NOT NULL,
spent int(5) unsigned NOT NULL DEFAULT '0',
UNIQUE KEY hourly_composite (date, user, category)
);
假设spents
表包含以下行:
+---------------------+------+----------+-------+
| date | user | category | spent |
+---------------------+------+----------+-------+
| 2018-10-01 10:00:00 | bob | food | 10 |
| 2018-10-01 11:00:00 | bob | hobbies | 50 |
| 2018-10-01 11:00:00 | bob | clothing | 30 |
| 2018-10-01 11:00:00 | bob | taxes | 3 |
| 2018-10-01 12:00:00 | bob | food | 30 |
| 2018-10-01 15:00:00 | bob | clothing | 25 |
| 2018-10-01 16:00:00 | bob | hobbies | 5 |
+---------------------+------+----------+-------+
例如,我希望在 2018-10-01 时获得 10 和 18 之间的支出总额用户 bob 。
最终结果应如下所示:
+---------------------+------+------------------------+-------------+
| hour | user | categories | total_spent |
+---------------------+------+------------------------+-------------+
| 2018-10-01 10:00:00 | bob | food | 10 |
| 2018-10-01 11:00:00 | bob | clothing,hobbies,taxes | 83 |
| 2018-10-01 12:00:00 | bob | food | 30 |
| 2018-10-01 13:00:00 | bob | | 0 |
| 2018-10-01 14:00:00 | bob | | 0 |
| 2018-10-01 15:00:00 | bob | clothing | 25 |
| 2018-10-01 16:00:00 | bob | hobbies | 5 |
| 2018-10-01 17:00:00 | bob | | 0 |
| 2018-10-01 18:00:00 | bob | | 0 |
+---------------------+------+------------------------+-------------+
因此查询如下:
-- get the scalar product of unique group and hour
SELECT hour, user,
IFNULL(GROUP_CONCAT(DISTINCT IF(hour = DATE_FORMAT(spents.date, "%Y-%m-%d %T") AND spent > 0, category, NULL)), "") AS categories,
SUM(IF(hour = DATE_FORMAT(spents.date, "%Y-%m-%d %T"), IFNULL(spent, 0), 0)) AS total_spent
FROM spents
CROSS JOIN
(
-- get all hours in the time interval
SELECT DATE_FORMAT(date, "%Y-%m-%d %T") AS hour
FROM calendar
WHERE date BETWEEN "2018-10-01 10:00:00" AND "2018-10-01 18:59:59"
GROUP BY hour
) AS interval_units
WHERE date BETWEEN "2018-10-01 10:00:00" AND "2018-10-01 18:59:59"
GROUP BY user, hour
ORDER BY user, hour;
此请求可以完美运行,但我不确定这是执行此操作的最佳方法。
当然,这是spents
表的非常简化的版本,想象一下一个唯一的键,该键在一天的每一小时都有8列以上的列
表格中有很多行(数百万)。
我使用calendar
表的原因是能够获得两个日期之间所有小时的详尽列表。
我还可以按年,月,日,星期几等分组
编辑:
这是EXPLAIN语句:
+----+-------------+------------+------------+-------+------------------+------------------+---------+------+------+----------+-----------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+------------------+------------------+---------+------+------+----------+-----------------------------------------------------------+
| 1 | PRIMARY | spents | NULL | range | hourly_composite | hourly_composite | 5 | NULL | 7 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 9 | 100.00 | Using join buffer (Block Nested Loop) |
| 2 | DERIVED | calendar | NULL | range | PRIMARY | PRIMARY | 5 | NULL | 9 | 100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+------------+------------+-------+------------------+------------------+---------+------+------+----------+-----------------------------------------------------------+