排名前20的组排名查询 - 优化

时间:2016-07-29 21:45:50

标签: mysql optimization

我正在创建一个报告结构,我需要为每个独特的公司 - 地区输出前20天的汇总统计数据。我已完成此任务,但觉得我的代码过于复杂,我正在请求帮助优化它。

我有2个表参与此过程。第一个列出所有可能的公司 - 地区 - 组 - 子组。第二个是小组的每小时统计数据 - 小组。

SQL小提琴链接:http://sqlfiddle.com/#!9/29a7b/1
注意:目前在我的SQL小提琴上出现SELECT command denied to user '<user>'@'<ip>' for table 'table_stats'错误,非常感谢帮助解决此问题。

table_companies声明和虚拟数据:

CREATE TABLE `table_companies` (
  `pk_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `company` varchar(45) NOT NULL,
  `region` varchar(45) NOT NULL,
  `group` varchar(45) NOT NULL,
  `subgroup` varchar(45) NOT NULL,
  PRIMARY KEY (`pk_id`),
  UNIQUE KEY `pk_id_id_UNIQUE` (`pk_id`)
);

INSERT INTO table_companies
    (`pk_id`, `company`, `region`, `group`, `subgroup`)
VALUES
    (1, 'company1', 'region1', 'group1', 'subgroup1'),
    (2, 'company1', 'region1', 'group1', 'subgroup2'),
    (3, 'company1', 'region2', 'group2', 'subgroup3'),
    (4, 'company1', 'region3', 'group3', 'subgroup4'),
    (5, 'company2', 'region1', 'group4', 'subgroup5'),
    (6, 'company2', 'region3', 'group5', 'subgroup6'),
    (7, 'company2', 'region3', 'group6', 'subgroup7'),
    (8, 'company2', 'region4', 'group7', 'subgroup8'),
    (9, 'company2', 'region5', 'group8', 'subgroup9'),
    (10, 'company3', 'region6', 'group9', 'subgroup10'),
    (11, 'company3', 'region7', 'group10', 'subgroup11'),
    (12, 'company3', 'region8', 'group11', 'subgroup12'),
    (13, 'company4', 'region9', 'group12', 'subgroup13'),
    (14, 'company4', 'region10', 'group13', 'subgroup14'),
    (15, 'company5', 'region11', 'group14', 'subgroup15'),
    (16, 'company5', 'region12', 'group15', 'subgroup16')
;

table_stats声明:
简化为每天仅包含一小时的小组 - 小组。

CREATE TABLE `table_stats` (
  `pk_id` int(10) unsigned NOT NULL,
  `date_time` datetime NOT NULL,
  `group` varchar(45) NOT NULL,
  `subgroup` varchar(45) NOT NULL,
  `stat` int(10) unsigned NOT NULL,
  PRIMARY KEY (`pk_id`),
  UNIQUE KEY `pk_id_UNIQUE` (`pk_id`),
  UNIQUE KEY `om_unique` (`date_time`,`group`,`subgroup`)
);

INSERT INTO table_stats
    (`pk_id`, `date_time`, `group`, `subgroup`, `stat`)
VALUES
    (1, '2015-12-01 06:00:00', 'group9', 'subgroup10', 14),
    (2, '2015-12-01 12:00:00', 'group9', 'subgroup10', 14),
    (3, '2015-12-02 06:00:00', 'group9', 'subgroup10', 2),
    (4, '2015-12-02 12:00:00', 'group9', 'subgroup10', 51),
    (5, '2015-12-03 06:00:00', 'group9', 'subgroup10', 30),
    (6, '2015-12-03 12:00:00', 'group9', 'subgroup10', 6),
    (7, '2015-12-04 06:00:00', 'group9', 'subgroup10', 9),
    (8, '2015-12-04 12:00:00', 'group9', 'subgroup10', 77),
    (9, '2015-12-05 06:00:00', 'group9', 'subgroup10', 70),
    (10, '2015-12-05 12:00:00', 'group9', 'subgroup10', 7),
    (11, '2015-12-06 06:00:00', 'group9', 'subgroup10', 38),
    (12, '2015-12-06 12:00:00', 'group9', 'subgroup10', 5),
    (13, '2015-12-07 06:00:00', 'group9', 'subgroup10', 86),
    (14, '2015-12-07 12:00:00', 'group9', 'subgroup10', 73),
    (15, '2015-12-08 06:00:00', 'group9', 'subgroup10', 45),
    (16, '2015-12-08 12:00:00', 'group9', 'subgroup10', 14),
    (17, '2015-12-09 06:00:00', 'group9', 'subgroup10', 66),
    (18, '2015-12-09 12:00:00', 'group9', 'subgroup10', 38),
    (19, '2015-12-10 06:00:00', 'group9', 'subgroup10', 12),
    (20, '2015-12-10 12:00:00', 'group9', 'subgroup10', 77),
    (21, '2015-12-11 06:00:00', 'group9', 'subgroup10', 21),
    (22, '2015-12-11 12:00:00', 'group9', 'subgroup10', 18),
    (23, '2015-12-12 06:00:00', 'group9', 'subgroup10', 28),
    (24, '2015-12-12 12:00:00', 'group9', 'subgroup10', 74),
    (25, '2015-12-13 06:00:00', 'group9', 'subgroup10', 20),
    (26, '2015-12-13 12:00:00', 'group9', 'subgroup10', 37),
    (27, '2015-12-14 06:00:00', 'group9', 'subgroup10', 66),
    (28, '2015-12-14 12:00:00', 'group9', 'subgroup10', 59),
    (29, '2015-12-15 06:00:00', 'group9', 'subgroup10', 26),
    (30, '2015-12-15 12:00:00', 'group9', 'subgroup10', 0),
    (31, '2015-12-16 06:00:00', 'group9', 'subgroup10', 77),
    (32, '2015-12-16 12:00:00', 'group9', 'subgroup10', 31),
    (33, '2015-12-17 06:00:00', 'group9', 'subgroup10', 59),
    (34, '2015-12-17 12:00:00', 'group9', 'subgroup10', 71),
    (35, '2015-12-18 06:00:00', 'group9', 'subgroup10', 7),
    (36, '2015-12-18 12:00:00', 'group9', 'subgroup10', 73),
    (37, '2015-12-19 06:00:00', 'group9', 'subgroup10', 72),
    (38, '2015-12-19 12:00:00', 'group9', 'subgroup10', 28),
    (39, '2015-12-20 06:00:00', 'group9', 'subgroup10', 50),
    (40, '2015-12-20 12:00:00', 'group9', 'subgroup10', 11),
    (41, '2015-12-21 06:00:00', 'group9', 'subgroup10', 71),
    (42, '2015-12-21 12:00:00', 'group9', 'subgroup10', 4),
    (43, '2015-12-22 06:00:00', 'group9', 'subgroup10', 78),
    (44, '2015-12-22 12:00:00', 'group9', 'subgroup10', 69),
    (45, '2015-12-23 06:00:00', 'group9', 'subgroup10', 83),
    (46, '2015-12-23 12:00:00', 'group9', 'subgroup10', 55),
    (47, '2015-12-24 06:00:00', 'group9', 'subgroup10', 71),
    (48, '2015-12-24 12:00:00', 'group9', 'subgroup10', 20),
    (49, '2015-12-25 06:00:00', 'group9', 'subgroup10', 90),
    (50, '2015-12-25 12:00:00', 'group9', 'subgroup10', 26),
    (51, '2015-12-26 06:00:00', 'group9', 'subgroup10', 1),
    (52, '2015-12-26 12:00:00', 'group9', 'subgroup10', 73),
    (53, '2015-12-27 06:00:00', 'group9', 'subgroup10', 4),
    (54, '2015-12-27 12:00:00', 'group9', 'subgroup10', 18),
    (55, '2015-12-28 06:00:00', 'group9', 'subgroup10', 4),
    (56, '2015-12-28 12:00:00', 'group9', 'subgroup10', 30),
    (57, '2015-12-29 06:00:00', 'group9', 'subgroup10', 56),
    (58, '2015-12-29 12:00:00', 'group9', 'subgroup10', 53),
    (59, '2015-12-30 06:00:00', 'group9', 'subgroup10', 33),
    (60, '2015-12-31 12:00:00', 'group9', 'subgroup10', 8)
;

查询优化:

SELECT * FROM
    (
    SELECT t3.company,t3.region,t3.day, t3.day_stat,COUNT(*) as rank
    FROM
        (
            SELECT t2.company,t2.region,DAY(t1.date_time) as day,SUM(t1.stat) as day_stat
            FROM schema1.table_stats t1
            INNER JOIN table_companies t2
            ON t1.group=t2.group AND t1.subgroup=t2.subgroup
            WHERE
                MONTH(t1.date_time)=12 AND
                YEAR(t1.date_time)=2015
            group by t2.company,t2.region,DAY(t1.date_time)
            ORDER BY t2.company,t2.region,day_stat DESC
        ) t3
    JOIN
    (
            SELECT t2.company,t2.region,DAY(t1.date_time) as day,SUM(t1.stat) as day_stat
            FROM schema1.table_stats t1
            INNER JOIN table_companies t2
            ON t1.group=t2.group AND t1.subgroup=t2.subgroup
            WHERE
                MONTH(t1.date_time)=12 AND
                YEAR(t1.date_time)=2015
            group by t2.company,t2.region,DAY(t1.date_time)
            ORDER BY t2.company,t2.region,day_stat DESC
        ) t4
    ON
        t4.day_stat >= t3.day_stat AND
        t4.company = t3.company AND
        t4.region = t3.region
    GROUP BY t3.company,t3.region,t3.day_stat
    ORDER BY t3.company,t3.region,rank
    ) t5
WHERE t5.rank<=20
;

查询摘要来自2个最深的子查询,它通过加入两个表开始,按公司,地区和日期对stat进行分组和汇总。这也是它限制月份和年份的地方。然后它将此结果连接到自身的副本以便能够生成排名。每个子组的最后选择限制结果排名前20位。

预期结果:
用于呈现SQL声明的道歉

INSERT INTO results
    (`company`, `region`, `day`, `day_stat`, `rank`)
VALUES
    ('company3', 'region6', 7, 159, 1),
    ('company3', 'region6', 22, 147, 2),
    ('company3', 'region6', 23, 138, 3),
    ('company3', 'region6', 17, 130, 4),
    ('company3', 'region6', 14, 125, 5),
    ('company3', 'region6', 25, 116, 6),
    ('company3', 'region6', 29, 109, 7),
    ('company3', 'region6', 16, 108, 8),
    ('company3', 'region6', 9, 104, 9),
    ('company3', 'region6', 12, 102, 10),
    ('company3', 'region6', 19, 100, 11),
    ('company3', 'region6', 24, 91, 12),
    ('company3', 'region6', 10, 89, 13),
    ('company3', 'region6', 4, 86, 14),
    ('company3', 'region6', 18, 80, 15),
    ('company3', 'region6', 5, 77, 16),
    ('company3', 'region6', 21, 75, 17),
    ('company3', 'region6', 26, 74, 18),
    ('company3', 'region6', 20, 61, 19),
    ('company3', 'region6', 8, 59, 20)
;

tl;博士:长篇大论道歉。要求优化http://sqlfiddle.com/#!9/29a7b/1

2 个答案:

答案 0 :(得分:0)

只需为组包含一个索引,即可使联接变得更有效

CREATE TABLE table_companies
    (`pk_id` int, `company` varchar(8), 
     `region` varchar(8), `group` varchar(7), `subgroup` varchar(10),
     PRIMARY KEY (`pk_id`),
     UNIQUE KEY `pk_id_id_UNIQUE` (`pk_id`),  

     INDEX idx_group (`group`, `subgroup`)
    )
;

答案 1 :(得分:0)

我所做的修改:

  • 完全修改了您的查询
  • table_companies
  • 上的group,subgroup表格中添加了综合索引
  • table_stats
  • 上的group, subgroup表格中添加了综合索引

修改后的查询:

SELECT 
    C.company,
    C.region,
    DAY(S.date_time) day,
    SUM(S.stat) day_stat
FROM table_companies C
INNER JOIN table_stats S
ON C.`group` = S.`group` AND C.subgroup = S.subgroup
WHERE MONTH(S.date_time) = 12 AND YEAR(S.date_time) = 2015
GROUP BY C.company, C.region, DAY(S.date_time)
ORDER BY day_stat DESC
LIMIT 20;

WORKING DEMO

结果集中没有排名列。由于结果按降序排序,因此您可以隐式将结果集中行的位置视为排名。然而,如果你真的需要排名列,那么这里是working demo

综合指数(table_companies)

ALTER TABLE `table_companies` ADD INDEX `idx_table_compnaies_group_subgroup` (
    `group`,
    `subgroup`
);

综合指数(table_stats)

ALTER TABLE `table_stats` ADD INDEX `idx_table_stats_group_subgroup` (
`group`,
`subgroup`
);

解释结果:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  S   ALL idx_table_compnaies_group_subgroup              60  Using where; Using temporary; Using filesort
1   SIMPLE  C   ref idx_table_companies_group_subgroup  idx_table_companies_group_subgroup  57  schema1.S.group,schema1.S.subgroup  1   Using index condition

好消息是MySQL可以使用这些indexes(因为这些都在可能的密钥下)。虽然它显示ALL作为table_companies的类型。我只能说这是一小组数据。 您无法根据较小的数据集判断效果。

更多:

我猜你在那些表中有主键。如果你没有,那么创建。

修改

SELECT 
    C.company,
    C.region,
    tt.day,
    tt.total AS day_stat,
    tt.rank
FROM table_companies C 
INNER JOIN 
(
SELECT 
t.*,
IF(t.businessUnit = @sameBusinessUnit, @rn := @rn + 1, @rn := 1) AS rank,
@sameBusinessUnit := t.businessUnit
FROM 
(
    SELECT 
       S1.`group`,
       S1.subgroup,
       CONCAT(S1.`group`,S1.subgroup) AS businessUnit,
       DAY(S1.date_time) AS day,
       SUM(S1.stat) total
    FROM table_stats S1
    GROUP BY S1.group,S1.subgroup,DAY(S1.date_time)
    ORDER BY total DESC
)AS t
CROSS JOIN (SELECT @rn := 1, @sameBusinessUnit := '') var
) AS tt
ON C.`group`=tt.`group` AND C.subgroup = tt.subgroup
WHERE tt.rank <= 20
ORDER BY tt.`group`,tt.`subgroup`,tt.rank;

WORKING DEMO(Version 2.0)