Question

给定如下表格，其中包含名称，任务，任务优先级和任务状态列表：

mysql> select * from test;
+----+------+--------+----------+--------+
| id | name | task   | priority | status |
+----+------+--------+----------+--------+
|  1 | bob  | start  |        1 | done   |
|  2 | bob  | work   |        2 | NULL   |
|  3 | bob  | finish |        3 | NULL   |
|  4 | jim  | start  |        1 | done   |
|  5 | jim  | work   |        2 | done   |
|  6 | jim  | finish |        3 | NULL   |
|  7 | mike | start  |        1 | done   |
|  8 | mike | work   |        2 | failed |
|  9 | mike | finish |        3 | NULL   |
| 10 | joan | start  |        1 | NULL   |
| 11 | joan | work   |        2 | NULL   |
| 12 | joan | finish |        3 | NULL   |
+----+------+--------+----------+--------+
12 rows in set (0.00 sec)

我想构建一个查询，它只返回每个名称要运行的下一个任务。具体来说，我想返回包含每个人具有NULL状态的最低优先级的行。

但是这里有一个问题：如果所有前面的任务都处于“完成”状态，我只想返回该行。

鉴于上述表和查询逻辑，此查询的最终结果应如下所示：

+----+------+--------+----------+--------+
| id | name | task   | priority | status |
+----+------+--------+----------+--------+
|  2 | bob  | work   |        2 | NULL   |
|  6 | jim  | finish |        3 | NULL   |
+----+------+--------+----------+--------+

最初，这是由一堆乱七八糟的子查询和派生表完成的，效率极低且速度慢。通过使用几个临时表来获得我想要的结果，我已经成功地加快了速度。

在现实世界中，这将在具有大约200k记录的表上运行，并且每个服务器将每分钟多次执行此查询。我目前的解决方案大约需要2秒才能运行，这根本不会。

这是获取我的示例数据的DML / DDL：

DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) DEFAULT NULL,
  `task` varchar(20) DEFAULT NULL,
  `priority` int(11) DEFAULT NULL,
  `status` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

INSERT INTO `test` VALUES 
(1,'bob','start',1,'done'),
(2,'bob','work',2,NULL),
(3,'bob','finish',3,NULL),
(4,'jim','start',1,'done'),
(5,'jim','work',2,'done'),
(6,'jim','finish',3,NULL),
(7,'mike','start',1,'done'),
(8,'mike','work',2,'failed'),
(9,'mike','finish',3,NULL),
(10,'joan','start',1,NULL),
(11,'joan','work',2,NULL),
(12,'joan','finish',3,NULL);

以下是我目前正在做的事情，以获得所需的结果（有效，但速度很慢）：

drop table if exists tmp1;
create temporary table tmp1 as 
select 
    name, 
    min(priority) as priority 
from test t 
where status is null 
group by name;
create index idx_pri on tmp1(priority);
create index idx_name on tmp1(name);

drop table if exists tmp2;
create temporary table tmp2 as 
select tmp.* 
from test t 
join tmp1 tmp 
    on t.name = tmp.name 
    and t.priority < tmp.priority 
group by name having sum(
    case when status = 'done' 
    then 0 
    else 1 
    end
) = 0;
create index idx_pri on tmp2(priority);
create index idx_name on tmp2(name);


select 
    t.*
from test t 
join tmp2 t2
    on t.name = t2.name
    and t.priority = t2.priority;

我在SQL Fiddle中也有DDL / DML，但我不能把我的解决方案放在那里，因为从技术上讲，这些临时表的创建是DDL，它不允许在查询框中使用DDL。 http://sqlfiddle.com/#!2/2d9e2/1

请帮我提出一个更好的方法来做到这一点。我愿意修改模式或逻辑以适应开箱即用的解决方案，只要所述解决方案有效。

Answer 1

您可以将您的逻辑直接转换为这样的查询：

select t.*
from test t 
where t.status is null and
      not exists (select 1
                  from test t2
                  where t2.name = t.name and
                        t2.id < t.id and
                        (t2.status <> 'done' or
                         t2.status is null
                        )
                 ) and
      exists (select 1
              from test t2
              where t2.name = t.name and
                    t2.id < t.id and
                    t2.status = 'done'
             );

为了提高效果，请在test(name, id, status)上创建索引。

Here是一个SQL小提琴。

Answer 2

此查询通过验证在给定任务之前未完成的任务数为0来确定是否完成给定任务之前的所有任务

SELECT t1.name, t1.id, t1.priority, t1.task
FROM test t1
JOIN test t2 
    ON t2.name = t1.name
    AND t2.priority < t1.priority
WHERE t1.status IS NULL
GROUP BY t1.name, t1.priority, t1.id, t1.task
HAVING COUNT(CASE WHEN t2.status = 'done' THEN NULL ELSE 1 END) = 0 

CREATE INDEX test_index1 ON test (name,status,priority,id,task);

http://sqlfiddle.com/#!2/c912f/7

Answer 3

我无法针对一个非常大的表测试速度，但这至少会从较小的样本表中返回正确的答案。但是，它应该与其他答案竞争，因为它只执行一个没有连接的子查询：

select  *
from    test  t1
where   t1.Status is null
  and exists (
    select 1
    from   test
    where  Name = t1.Name and
           Priority < t1.Priority
    group by Name
    having count(*) = sum( case when Status = 'done' then 1 else 0 end )
);

Answer 4

SELECT a.* 
  FROM test a 
  JOIN 
     ( SELECT x.name
            , MIN(x.priority) priority 
         FROM test x 
         LEFT 
         JOIN test y 
           ON y.name = x.name 
          AND y.priority < x.priority 
          AND y.status <> 'done' 
        WHERE y.id IS NULL 
          AND x.status IS NULL
        GROUP BY x.name
     ) b 
    ON b.name = a.name 
   AND b.priority = a.priority
   AND a.priority > 1;

Mysql Query：返回组中所有前面行匹配条件的行

4 个答案: