给定如下表格,其中包含名称,任务,任务优先级和任务状态列表:
mysql> select * from test;
+----+------+--------+----------+--------+
| id | name | task | priority | status |
+----+------+--------+----------+--------+
| 1 | bob | start | 1 | done |
| 2 | bob | work | 2 | NULL |
| 3 | bob | finish | 3 | NULL |
| 4 | jim | start | 1 | done |
| 5 | jim | work | 2 | done |
| 6 | jim | finish | 3 | NULL |
| 7 | mike | start | 1 | done |
| 8 | mike | work | 2 | failed |
| 9 | mike | finish | 3 | NULL |
| 10 | joan | start | 1 | NULL |
| 11 | joan | work | 2 | NULL |
| 12 | joan | finish | 3 | NULL |
+----+------+--------+----------+--------+
12 rows in set (0.00 sec)
我想构建一个查询,它只返回每个名称要运行的下一个任务。具体来说,我想返回包含每个人具有NULL状态的最低优先级的行。
但是这里有一个问题:如果所有前面的任务都处于“完成”状态,我只想返回该行。
鉴于上述表和查询逻辑,此查询的最终结果应如下所示:
+----+------+--------+----------+--------+
| id | name | task | priority | status |
+----+------+--------+----------+--------+
| 2 | bob | work | 2 | NULL |
| 6 | jim | finish | 3 | NULL |
+----+------+--------+----------+--------+
最初,这是由一堆乱七八糟的子查询和派生表完成的,效率极低且速度慢。通过使用几个临时表来获得我想要的结果,我已经成功地加快了速度。
在现实世界中,这将在具有大约200k记录的表上运行,并且每个服务器将每分钟多次执行此查询。我目前的解决方案大约需要2秒才能运行,这根本不会。
这是获取我的示例数据的DML / DDL:
DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) DEFAULT NULL,
`task` varchar(20) DEFAULT NULL,
`priority` int(11) DEFAULT NULL,
`status` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `test` VALUES
(1,'bob','start',1,'done'),
(2,'bob','work',2,NULL),
(3,'bob','finish',3,NULL),
(4,'jim','start',1,'done'),
(5,'jim','work',2,'done'),
(6,'jim','finish',3,NULL),
(7,'mike','start',1,'done'),
(8,'mike','work',2,'failed'),
(9,'mike','finish',3,NULL),
(10,'joan','start',1,NULL),
(11,'joan','work',2,NULL),
(12,'joan','finish',3,NULL);
以下是我目前正在做的事情,以获得所需的结果(有效,但速度很慢):
drop table if exists tmp1;
create temporary table tmp1 as
select
name,
min(priority) as priority
from test t
where status is null
group by name;
create index idx_pri on tmp1(priority);
create index idx_name on tmp1(name);
drop table if exists tmp2;
create temporary table tmp2 as
select tmp.*
from test t
join tmp1 tmp
on t.name = tmp.name
and t.priority < tmp.priority
group by name having sum(
case when status = 'done'
then 0
else 1
end
) = 0;
create index idx_pri on tmp2(priority);
create index idx_name on tmp2(name);
select
t.*
from test t
join tmp2 t2
on t.name = t2.name
and t.priority = t2.priority;
我在SQL Fiddle中也有DDL / DML,但我不能把我的解决方案放在那里,因为从技术上讲,这些临时表的创建是DDL,它不允许在查询框中使用DDL。 http://sqlfiddle.com/#!2/2d9e2/1
请帮我提出一个更好的方法来做到这一点。我愿意修改模式或逻辑以适应开箱即用的解决方案,只要所述解决方案有效。
答案 0 :(得分:1)
您可以将您的逻辑直接转换为这样的查询:
select t.*
from test t
where t.status is null and
not exists (select 1
from test t2
where t2.name = t.name and
t2.id < t.id and
(t2.status <> 'done' or
t2.status is null
)
) and
exists (select 1
from test t2
where t2.name = t.name and
t2.id < t.id and
t2.status = 'done'
);
为了提高效果,请在test(name, id, status)
上创建索引。
Here是一个SQL小提琴。
答案 1 :(得分:1)
此查询通过验证在给定任务之前未完成的任务数为0来确定是否完成给定任务之前的所有任务
SELECT t1.name, t1.id, t1.priority, t1.task
FROM test t1
JOIN test t2
ON t2.name = t1.name
AND t2.priority < t1.priority
WHERE t1.status IS NULL
GROUP BY t1.name, t1.priority, t1.id, t1.task
HAVING COUNT(CASE WHEN t2.status = 'done' THEN NULL ELSE 1 END) = 0
CREATE INDEX test_index1 ON test (name,status,priority,id,task);
答案 2 :(得分:1)
我无法针对一个非常大的表测试速度,但这至少会从较小的样本表中返回正确的答案。但是,它应该与其他答案竞争,因为它只执行一个没有连接的子查询:
select *
from test t1
where t1.Status is null
and exists (
select 1
from test
where Name = t1.Name and
Priority < t1.Priority
group by Name
having count(*) = sum( case when Status = 'done' then 1 else 0 end )
);
答案 3 :(得分:0)
SELECT a.*
FROM test a
JOIN
( SELECT x.name
, MIN(x.priority) priority
FROM test x
LEFT
JOIN test y
ON y.name = x.name
AND y.priority < x.priority
AND y.status <> 'done'
WHERE y.id IS NULL
AND x.status IS NULL
GROUP BY x.name
) b
ON b.name = a.name
AND b.priority = a.priority
AND a.priority > 1;