Mysql Query:返回组中所有前面行匹配条件的行

时间:2014-06-14 19:00:49

标签: mysql sql query-optimization

给定如下表格,其中包含名称,任务,任务优先级和任务状态列表:

mysql> select * from test;
+----+------+--------+----------+--------+
| id | name | task   | priority | status |
+----+------+--------+----------+--------+
|  1 | bob  | start  |        1 | done   |
|  2 | bob  | work   |        2 | NULL   |
|  3 | bob  | finish |        3 | NULL   |
|  4 | jim  | start  |        1 | done   |
|  5 | jim  | work   |        2 | done   |
|  6 | jim  | finish |        3 | NULL   |
|  7 | mike | start  |        1 | done   |
|  8 | mike | work   |        2 | failed |
|  9 | mike | finish |        3 | NULL   |
| 10 | joan | start  |        1 | NULL   |
| 11 | joan | work   |        2 | NULL   |
| 12 | joan | finish |        3 | NULL   |
+----+------+--------+----------+--------+
12 rows in set (0.00 sec)

我想构建一个查询,它只返回每个名称要运行的下一个任务。具体来说,我想返回包含每个人具有NULL状态的最低优先级的行。

但是这里有一个问题:如果所有前面的任务都处于“完成”状态,我只想返回该行。

鉴于上述表和查询逻辑,此查询的最终结果应如下所示:

+----+------+--------+----------+--------+
| id | name | task   | priority | status |
+----+------+--------+----------+--------+
|  2 | bob  | work   |        2 | NULL   |
|  6 | jim  | finish |        3 | NULL   |
+----+------+--------+----------+--------+

最初,这是由一堆乱七八糟的子查询和派生表完成的,效率极低且速度慢。通过使用几个临时表来获得我想要的结果,我已经成功地加快了速度。

在现实世界中,这将在具有大约200k记录的表上运行,并且每个服务器将每分钟多次执行此查询。我目前的解决方案大约需要2秒才能运行,这根本不会。

这是获取我的示例数据的DML / DDL:

DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) DEFAULT NULL,
  `task` varchar(20) DEFAULT NULL,
  `priority` int(11) DEFAULT NULL,
  `status` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

INSERT INTO `test` VALUES 
(1,'bob','start',1,'done'),
(2,'bob','work',2,NULL),
(3,'bob','finish',3,NULL),
(4,'jim','start',1,'done'),
(5,'jim','work',2,'done'),
(6,'jim','finish',3,NULL),
(7,'mike','start',1,'done'),
(8,'mike','work',2,'failed'),
(9,'mike','finish',3,NULL),
(10,'joan','start',1,NULL),
(11,'joan','work',2,NULL),
(12,'joan','finish',3,NULL);

以下是我目前正在做的事情,以获得所需的结果(有效,但速度很慢):

drop table if exists tmp1;
create temporary table tmp1 as 
select 
    name, 
    min(priority) as priority 
from test t 
where status is null 
group by name;
create index idx_pri on tmp1(priority);
create index idx_name on tmp1(name);

drop table if exists tmp2;
create temporary table tmp2 as 
select tmp.* 
from test t 
join tmp1 tmp 
    on t.name = tmp.name 
    and t.priority < tmp.priority 
group by name having sum(
    case when status = 'done' 
    then 0 
    else 1 
    end
) = 0;
create index idx_pri on tmp2(priority);
create index idx_name on tmp2(name);


select 
    t.*
from test t 
join tmp2 t2
    on t.name = t2.name
    and t.priority = t2.priority;

我在SQL Fiddle中也有DDL / DML,但我不能把我的解决方案放在那里,因为从技术上讲,这些临时表的创建是DDL,它不允许在查询框中使用DDL。 http://sqlfiddle.com/#!2/2d9e2/1

请帮我提出一个更好的方法来做到这一点。我愿意修改模式或逻辑以适应开箱即用的解决方案,只要所述解决方案有效。

4 个答案:

答案 0 :(得分:1)

您可以将您的逻辑直接转换为这样的查询:

select t.*
from test t 
where t.status is null and
      not exists (select 1
                  from test t2
                  where t2.name = t.name and
                        t2.id < t.id and
                        (t2.status <> 'done' or
                         t2.status is null
                        )
                 ) and
      exists (select 1
              from test t2
              where t2.name = t.name and
                    t2.id < t.id and
                    t2.status = 'done'
             );

为了提高效果,请在test(name, id, status)上创建索引。

Here是一个SQL小提琴。

答案 1 :(得分:1)

此查询通过验证在给定任务之前未完成的任务数为0来确定是否完成给定任务之前的所有任务

SELECT t1.name, t1.id, t1.priority, t1.task
FROM test t1
JOIN test t2 
    ON t2.name = t1.name
    AND t2.priority < t1.priority
WHERE t1.status IS NULL
GROUP BY t1.name, t1.priority, t1.id, t1.task
HAVING COUNT(CASE WHEN t2.status = 'done' THEN NULL ELSE 1 END) = 0 

CREATE INDEX test_index1 ON test (name,status,priority,id,task);

http://sqlfiddle.com/#!2/c912f/7

答案 2 :(得分:1)

我无法针对一个非常大的表测试速度,但这至少会从较小的样本表中返回正确的答案。但是,它应该与其他答案竞争,因为它只执行一个没有连接的子查询:

select  *
from    test  t1
where   t1.Status is null
  and exists (
    select 1
    from   test
    where  Name = t1.Name and
           Priority < t1.Priority
    group by Name
    having count(*) = sum( case when Status = 'done' then 1 else 0 end )
);

答案 3 :(得分:0)

SELECT a.* 
  FROM test a 
  JOIN 
     ( SELECT x.name
            , MIN(x.priority) priority 
         FROM test x 
         LEFT 
         JOIN test y 
           ON y.name = x.name 
          AND y.priority < x.priority 
          AND y.status <> 'done' 
        WHERE y.id IS NULL 
          AND x.status IS NULL
        GROUP BY x.name
     ) b 
    ON b.name = a.name 
   AND b.priority = a.priority
   AND a.priority > 1;