如何根据相邻日期的值总和查找MySQL行?

时间:2016-06-07 15:25:28

标签: mysql

我有一张表格,其中包含不同员工在不同日期的时间条目,以及记录其时间的活动。我想找到他们在同一活动中花费最少时间的所有行,例如3天。

这是我要查询的表格的简化版本:

CREATE TABLE `time_entries` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `employee_id` int(11) NOT NULL,
  `activity_id` int(11) NOT NULL,
  `work_date` date NOT NULL,
  `time_spent` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Time, in minutes, spent on the current activity',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

以及一些示例数据:

+----+-------------+-------------+------------+------------+
| id | employee_id | activity_id | work_date  | time_spent |
+----+-------------+-------------+------------+------------+
| 10 |           1 |           2 | 2016-06-11 |        120 |
| 16 |           1 |           3 | 2016-06-21 |        450 |
| 29 |           1 |           4 | 2016-06-22 |        450 |
| 17 |           1 |           4 | 2016-06-23 |        450 |
| 12 |           3 |           4 | 2016-06-23 |        450 |
|  4 |           1 |           4 | 2016-06-24 |        450 |
| 22 |           1 |           4 | 2016-06-26 |         60 |
|  9 |           1 |           6 | 2016-06-27 |        450 |
+----+-------------+-------------+------------+------------+

time_spent 以分钟为单位,我基本上想要选择所有形成至少3天块的行, time_spent = n days * 450分钟,在同一 activity_id employee_id

在上面的示例中,我想要检索行29,17,4。不会包含第16行,因为这是一个不同的 activity_id ,也不会包含第12行,因为这是一个不同的 EMPLOYEE_ID 。第22行错过了一个日期,因此会突破#39;日期。

我想我可以创建一个视图或临时表来给我一系列日期,并使用一些聚合函数根据 SUM(time_spent对行进行分组。 work_date介于给定日期和 work_date + 3天

之间

这不是我之前必须实现的,但是考虑它,可以成为将来分析的有用工具。

2 个答案:

答案 0 :(得分:2)

使用以下架构和我的测试数据:

模式

CREATE TABLE `time_entries` (
  `id` int(11) AUTO_INCREMENT PRIMARY KEY,
  `employee_id` int(11) NOT NULL,
  `activity_id` int(11) NOT NULL,
  `work_date` date NOT NULL,
  `time_spent` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Time, in minutes, spent on the current activity'
) ENGINE=InnoDB;

测试数据

请注意,为了简化构造测试数据,我使用自动增量并允许db分配id。而不是直接插入id。我确实在下面的最右边显示了ID号,例如-- 7

insert time_entries(employee_id,activity_id,work_date,time_spent) values
(1,2,'2016-06-11',120), --  1
(1,3,'2016-06-21',450), --  2
(1,13,'2016-06-21',450), -- 3
(1,14,'2016-06-21',450), -- 4
(1,15,'2016-06-21',450), -- 5
(1,4,'2016-06-22',450), -- 6
(1,4,'2016-06-23',450), -- 7
(3,4,'2016-06-23',450), -- 8
(1,4,'2016-06-24',450), -- 9
(1,16,'2016-06-25',450), -- 10
(1,17,'2016-06-25',450), -- 11
(1,4,'2016-06-26',60), -- 12
(1,6,'2016-06-27',450), -- 13
(3,4,'2016-06-27',450), -- 14
(3,4,'2016-06-28',450), -- 15
(3,4,'2016-06-29',450), -- 16
(4,4,'2016-06-28',200), -- 17
(4,4,'2016-06-29',200), -- 18
(4,4,'2016-06-30',200), -- 19
(4,4,'2016-07-01',200), -- 20
(4,4,'2016-07-03',200), -- 21
(5,4,'2016-07-08',200), -- 22
(5,4,'2016-07-09',200), -- 23
(5,4,'2016-07-10',200), -- 24
(5,4,'2016-07-12',200), -- 25
(5,4,'2016-07-13',200), -- 26
(5,4,'2016-07-14',200), -- 27
(5,4,'2016-07-15',200), -- 28
(6,6,'2016-08-01',500), -- 29
(6,6,'2016-08-02',500), -- 30
(6,6,'2016-08-04',500), -- 31
(6,6,'2016-08-05',500), -- 32
(7,6,'2016-08-21',500), -- 33
(7,6,'2016-08-22',500), -- 34
(7,6,'2016-08-23',500), -- 35
(7,6,'2016-08-25',500), -- 36
(7,6,'2016-08-26',500); -- 37

最终查询

select distinct t4.id,t4.employee_id,t4.activity_id,t4.work_date,t4.time_spent
from time_entries t4
join
(   select t3.id,t3.employee_id,t3.activity_id,t3.work_date
    from time_entries t3
    join
    (   select t1.id,count(*) as rowcount,sum(t2.time_spent) as timeworked
        from time_entries t1
        join time_entries t2
        on t2.employee_id=t1.employee_id 
        and t2.activity_id=t1.activity_id 
        and datediff(t2.work_date,t1.work_date)<=2
        and t2.work_date>=t1.work_date
        group by t1.id
        having rowcount=3 and timeworked>=450
    ) xDerived1
    on t3.id=xDerived1.id
) xDerived2
on t4.employee_id=xDerived2.employee_id 
and t4.activity_id=xDerived2.activity_id
and datediff(t4.work_date,xDerived2.work_date)<=2
and datediff(t4.work_date,xDerived2.work_date)>=0
order by t4.employee_id,t4.activity_id,t4.work_date;

结果

+----+-------------+-------------+------------+------------+
| id | employee_id | activity_id | work_date  | time_spent |
+----+-------------+-------------+------------+------------+
|  6 |           1 |           4 | 2016-06-22 |        450 |
|  7 |           1 |           4 | 2016-06-23 |        450 |
|  9 |           1 |           4 | 2016-06-24 |        450 |
| 14 |           3 |           4 | 2016-06-27 |        450 |
| 15 |           3 |           4 | 2016-06-28 |        450 |
| 16 |           3 |           4 | 2016-06-29 |        450 |
| 17 |           4 |           4 | 2016-06-28 |        200 |
| 18 |           4 |           4 | 2016-06-29 |        200 |
| 19 |           4 |           4 | 2016-06-30 |        200 |
| 20 |           4 |           4 | 2016-07-01 |        200 |
| 22 |           5 |           4 | 2016-07-08 |        200 |
| 23 |           5 |           4 | 2016-07-09 |        200 |
| 24 |           5 |           4 | 2016-07-10 |        200 |
| 25 |           5 |           4 | 2016-07-12 |        200 |
| 26 |           5 |           4 | 2016-07-13 |        200 |
| 27 |           5 |           4 | 2016-07-14 |        200 |
| 28 |           5 |           4 | 2016-07-15 |        200 |
| 33 |           7 |           6 | 2016-08-21 |        500 |
| 34 |           7 |           6 | 2016-08-22 |        500 |
| 35 |           7 |           6 | 2016-08-23 |        500 |
+----+-------------+-------------+------------+------------+
20 rows in set (0.00 sec)

大约一半的行符合条件。根据“显示行所在......”的要求,它可以显示行,如果连续4天(对于给定的工人/活动/小时数),可能会出现3个以上的行结果。意思是,如果有一个4块,前3个可以获得资格,最后3个可以获得资格。结果显示了这一点。

有关xDerived1的可视化,请参阅以下内容:

enter image description here

答案 1 :(得分:0)

这是解决它的另一种方法,在选择三个以上的组之前,对最里面的子查询进行排序并使用变量将连续的条目链接在一起。我不得不说,我认为我更喜欢Drew的解决方案。

SELECT t4.* FROM time_entries t4
JOIN
(SELECT employee_id, activity_id, MIN(work_date) min, MAX(work_date) max FROM
    (SELECT t.id,
            @employee_id := t.employee_id employee_id,
            @activity_id := t.activity_id activity_id,
            @work_date := t.work_date work_date,
            @i i
        FROM (SELECT * FROM time_entries ORDER BY employee_id, activity_id, work_date) t
        JOIN (SELECT @employee_id := 0, @work_date := NULL, @i := 0) tmp
        WHERE 
        CASE WHEN @employee_id = employee_id
              AND @activity_id = activity_id
              AND work_date = DATE_ADD(@work_date, INTERVAL 1 DAY)
             THEN @i ELSE @i := @i + 1 END) t2
    GROUP BY t2.i, t2.employee_id, t2.activity_id
    HAVING COUNT(*) >= 3) t3
WHERE t4.employee_id = t3.employee_id
  AND t4.activity_id = t3.activity_id
  AND t4.work_date BETWEEN t3.min AND t3.max;