这让我头疼! :P
我有一张assignments
表,我想根据他们的作业计算成员的持续时间。简化形式,这将是相对简单的。
-------------------------------------------------------------------------
| id | member_id | unit_id | start_date | end_date |
-------------------------------------------------------------------------
| 1 | 2 | 23 | 2013-01-01 | 2013-02-01 |
-------------------------------------------------------------------------
| 2 | 2 | 25 | 2013-02-01 | 2013-03-01 |
-------------------------------------------------------------------------
| 3 | 2 | 27 | 2013-03-01 | NULL |
-------------------------------------------------------------------------
这只是在SUM()
和DATEDIFF()
上start_date
进行end_date
的问题。问题是成员有可能同时进行任务。
-------------------------------------------------------------------------
| id | member_id | unit_id | start_date | end_date |
-------------------------------------------------------------------------
| 1 | 2 | 23 | 2013-01-01 | 2013-02-01 |
-------------------------------------------------------------------------
| 2 | 2 | 25 | 2013-02-01 | 2013-03-01 |
-------------------------------------------------------------------------
| 3 | 2 | 30 | 2013-02-15 | 2013-03-01 |*
-------------------------------------------------------------------------
| 4 | 2 | 27 | 2013-03-01 | NULL |
-------------------------------------------------------------------------
现在我必须知道#3与#2同时发生,所以我不应该将它添加到SUM()
。
更进一步,如果成员的持续时间有差距怎么办?
-------------------------------------------------------------------------
| id | member_id | unit_id | start_date | end_date |
-------------------------------------------------------------------------
| 1 | 2 | 23 | 2013-01-01 | 2013-02-01 |
-------------------------------------------------------------------------
| 2 | 2 | 25 | 2013-02-01 | 2013-02-05 |*
-------------------------------------------------------------------------
| 3 | 2 | 30 | 2013-02-15 | 2013-03-01 |*
-------------------------------------------------------------------------
| 4 | 2 | 27 | 2013-03-01 | NULL |
-------------------------------------------------------------------------
此外,NULL
表示“当前”,因此CURDATE()
。
有什么想法吗?
答案 0 :(得分:1)
这是个主意。将每条记录分成两部分,以获得分配开始和结束时的日期列表。然后确定在给定日期有多少分配是活动的 - 基本上每个开始添加“1”,每个末端添加“-1”并获取累积总和。
接下来,您需要确定下一个日期何时在进行最终聚合之前获取句点。
第一部分由此查询处理:
select member_id, thedate,
@sumstart := if(@prevmemberid = memberid, @sumstart + isstart, isstart) as sumstart,
@prevmemberid := memberid
from (select member_id, start_date as thedate, 1 as isstart
from assignments
union all
select member_id, end_date, -1 as isstart
from assignments
order by member_id, thedate
) a cross join
(select @sumstart := 0, @prevmemberid := NULL) const;
其余的则使用更多变量:
select member_id,
sum(case when sumstart > 0 then datediff(nextdate, thedate) end) as daysactive
from (select member_id, thedate, sumstart,
if(@prevmemberid = memberid, @nextdate, NULL) as nextdate,
@prevmemberid := memberid,
@nextdate = thedate
from (select member_id, thedate,
@sumstart := if(@prevmemberid = memberid, @sumstart + isstart, isstart) as sumstart,
@prevmemberid := memberid
from (select member_id, start_date as thedate, 1 as isstart
from assignments
union all
select member_id, coalesce(end_date, CURDATE()), -1 as isstart
from assignments
order by member_id, thedate
) a cross join
(select @sumstart := 0, @prevmemberid := NULL) const;
) a cross join
(select @nextmemberid := NULL, @nextdate := NULL) const
order by member_id, thedate desc;
) a
group by member_id;
我不喜欢以这种方式使用变量,因为MySQL不保证给定select
中变量赋值的排序。但实际上,它们是按照写入的顺序(这个查询所依赖的)进行评估的。虽然这可以在没有变量的情况下编写,但没有with
语句,窗口函数,甚至是在from
子句中进行子查询的视图,结果SQL将是很多丑陋。
答案 1 :(得分:0)
我认为在代码中而不是在SQL中执行过滤掉重叠分配更容易。 您可以检索由start_date命令的某个member_id的所有分配:
select * from assignments where member_id='2' order by start_date asc
然后,您可以循环这些分配并过滤掉重叠的分配。 如果A在B开始之前结束或者如果B在A开始之前结束,则两个分配A和B不重叠。
因为我们根据开始日期对结果进行了排序,所以我们可以放心地忽略第二种情况:B永远不会在A之前开始,因此它不能在A开始之前结束。 然后我们得到类似的东西:
for i=0..assignments.length
for j=i+1..assignments.length
if (assignments[j].start_date < assignments[i].end_date)
assignments[j] = null; // it overlaps -> get rid of it
然后循环分配并总结非空分配的持续时间。这应该很容易