我有一张表格,其中包含不同员工在不同日期的时间条目,以及记录其时间的活动。我想找到他们在同一活动中花费最少时间的所有行,例如3天。
这是我要查询的表格的简化版本:
CREATE TABLE `time_entries` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`employee_id` int(11) NOT NULL,
`activity_id` int(11) NOT NULL,
`work_date` date NOT NULL,
`time_spent` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Time, in minutes, spent on the current activity',
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
以及一些示例数据:
+----+-------------+-------------+------------+------------+
| id | employee_id | activity_id | work_date | time_spent |
+----+-------------+-------------+------------+------------+
| 10 | 1 | 2 | 2016-06-11 | 120 |
| 16 | 1 | 3 | 2016-06-21 | 450 |
| 29 | 1 | 4 | 2016-06-22 | 450 |
| 17 | 1 | 4 | 2016-06-23 | 450 |
| 12 | 3 | 4 | 2016-06-23 | 450 |
| 4 | 1 | 4 | 2016-06-24 | 450 |
| 22 | 1 | 4 | 2016-06-26 | 60 |
| 9 | 1 | 6 | 2016-06-27 | 450 |
+----+-------------+-------------+------------+------------+
time_spent 以分钟为单位,我基本上想要选择所有形成至少3天块的行, time_spent = n days * 450分钟,在同一 activity_id 和 employee_id
上在上面的示例中,我想要检索行29,17,4。不会包含第16行,因为这是一个不同的 activity_id ,也不会包含第12行,因为这是一个不同的 EMPLOYEE_ID 。第22行错过了一个日期,因此会突破#39;日期。
我想我可以创建一个视图或临时表来给我一系列日期,并使用一些聚合函数根据 SUM(time_spent
)对行进行分组。 work_date介于给定日期和 work_date
+ 3天
这不是我之前必须实现的,但是考虑它,可以成为将来分析的有用工具。
答案 0 :(得分:2)
使用以下架构和我的测试数据:
CREATE TABLE `time_entries` (
`id` int(11) AUTO_INCREMENT PRIMARY KEY,
`employee_id` int(11) NOT NULL,
`activity_id` int(11) NOT NULL,
`work_date` date NOT NULL,
`time_spent` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Time, in minutes, spent on the current activity'
) ENGINE=InnoDB;
请注意,为了简化构造测试数据,我使用自动增量并允许db分配id。而不是直接插入id。我确实在下面的最右边显示了ID号,例如-- 7
insert time_entries(employee_id,activity_id,work_date,time_spent) values
(1,2,'2016-06-11',120), -- 1
(1,3,'2016-06-21',450), -- 2
(1,13,'2016-06-21',450), -- 3
(1,14,'2016-06-21',450), -- 4
(1,15,'2016-06-21',450), -- 5
(1,4,'2016-06-22',450), -- 6
(1,4,'2016-06-23',450), -- 7
(3,4,'2016-06-23',450), -- 8
(1,4,'2016-06-24',450), -- 9
(1,16,'2016-06-25',450), -- 10
(1,17,'2016-06-25',450), -- 11
(1,4,'2016-06-26',60), -- 12
(1,6,'2016-06-27',450), -- 13
(3,4,'2016-06-27',450), -- 14
(3,4,'2016-06-28',450), -- 15
(3,4,'2016-06-29',450), -- 16
(4,4,'2016-06-28',200), -- 17
(4,4,'2016-06-29',200), -- 18
(4,4,'2016-06-30',200), -- 19
(4,4,'2016-07-01',200), -- 20
(4,4,'2016-07-03',200), -- 21
(5,4,'2016-07-08',200), -- 22
(5,4,'2016-07-09',200), -- 23
(5,4,'2016-07-10',200), -- 24
(5,4,'2016-07-12',200), -- 25
(5,4,'2016-07-13',200), -- 26
(5,4,'2016-07-14',200), -- 27
(5,4,'2016-07-15',200), -- 28
(6,6,'2016-08-01',500), -- 29
(6,6,'2016-08-02',500), -- 30
(6,6,'2016-08-04',500), -- 31
(6,6,'2016-08-05',500), -- 32
(7,6,'2016-08-21',500), -- 33
(7,6,'2016-08-22',500), -- 34
(7,6,'2016-08-23',500), -- 35
(7,6,'2016-08-25',500), -- 36
(7,6,'2016-08-26',500); -- 37
select distinct t4.id,t4.employee_id,t4.activity_id,t4.work_date,t4.time_spent
from time_entries t4
join
( select t3.id,t3.employee_id,t3.activity_id,t3.work_date
from time_entries t3
join
( select t1.id,count(*) as rowcount,sum(t2.time_spent) as timeworked
from time_entries t1
join time_entries t2
on t2.employee_id=t1.employee_id
and t2.activity_id=t1.activity_id
and datediff(t2.work_date,t1.work_date)<=2
and t2.work_date>=t1.work_date
group by t1.id
having rowcount=3 and timeworked>=450
) xDerived1
on t3.id=xDerived1.id
) xDerived2
on t4.employee_id=xDerived2.employee_id
and t4.activity_id=xDerived2.activity_id
and datediff(t4.work_date,xDerived2.work_date)<=2
and datediff(t4.work_date,xDerived2.work_date)>=0
order by t4.employee_id,t4.activity_id,t4.work_date;
+----+-------------+-------------+------------+------------+
| id | employee_id | activity_id | work_date | time_spent |
+----+-------------+-------------+------------+------------+
| 6 | 1 | 4 | 2016-06-22 | 450 |
| 7 | 1 | 4 | 2016-06-23 | 450 |
| 9 | 1 | 4 | 2016-06-24 | 450 |
| 14 | 3 | 4 | 2016-06-27 | 450 |
| 15 | 3 | 4 | 2016-06-28 | 450 |
| 16 | 3 | 4 | 2016-06-29 | 450 |
| 17 | 4 | 4 | 2016-06-28 | 200 |
| 18 | 4 | 4 | 2016-06-29 | 200 |
| 19 | 4 | 4 | 2016-06-30 | 200 |
| 20 | 4 | 4 | 2016-07-01 | 200 |
| 22 | 5 | 4 | 2016-07-08 | 200 |
| 23 | 5 | 4 | 2016-07-09 | 200 |
| 24 | 5 | 4 | 2016-07-10 | 200 |
| 25 | 5 | 4 | 2016-07-12 | 200 |
| 26 | 5 | 4 | 2016-07-13 | 200 |
| 27 | 5 | 4 | 2016-07-14 | 200 |
| 28 | 5 | 4 | 2016-07-15 | 200 |
| 33 | 7 | 6 | 2016-08-21 | 500 |
| 34 | 7 | 6 | 2016-08-22 | 500 |
| 35 | 7 | 6 | 2016-08-23 | 500 |
+----+-------------+-------------+------------+------------+
20 rows in set (0.00 sec)
大约一半的行符合条件。根据“显示行所在......”的要求,它可以显示行,如果连续4天(对于给定的工人/活动/小时数),可能会出现3个以上的行结果。意思是,如果有一个4块,前3个可以获得资格,最后3个可以获得资格。结果显示了这一点。
有关xDerived1的可视化,请参阅以下内容:
答案 1 :(得分:0)
这是解决它的另一种方法,在选择三个以上的组之前,对最里面的子查询进行排序并使用变量将连续的条目链接在一起。我不得不说,我认为我更喜欢Drew的解决方案。
SELECT t4.* FROM time_entries t4
JOIN
(SELECT employee_id, activity_id, MIN(work_date) min, MAX(work_date) max FROM
(SELECT t.id,
@employee_id := t.employee_id employee_id,
@activity_id := t.activity_id activity_id,
@work_date := t.work_date work_date,
@i i
FROM (SELECT * FROM time_entries ORDER BY employee_id, activity_id, work_date) t
JOIN (SELECT @employee_id := 0, @work_date := NULL, @i := 0) tmp
WHERE
CASE WHEN @employee_id = employee_id
AND @activity_id = activity_id
AND work_date = DATE_ADD(@work_date, INTERVAL 1 DAY)
THEN @i ELSE @i := @i + 1 END) t2
GROUP BY t2.i, t2.employee_id, t2.activity_id
HAVING COUNT(*) >= 3) t3
WHERE t4.employee_id = t3.employee_id
AND t4.activity_id = t3.activity_id
AND t4.work_date BETWEEN t3.min AND t3.max;