对具有挑战性的SQL问题感兴趣,请继续阅读:
对于下面的数据集,我试图找到一种逻辑,该逻辑可以标识每个员工的新项目的开始日期。
确定新项目开始日期的逻辑是:
在14天的时间范围内,员工没有当前日期之前的任何日期记录。
项目窗口仅在开始后的14天持续。落在该窗口之外的第一条记录将被计为下一个项目的开始。
Redshift / Postgres解决方案均被接受。
请注意, Redshift 在窗口框架中不支持递归CTE或RANGE关键字。
感谢阅读。
答案 0 :(得分:0)
对于Postgresql,包括数据集的CTE(DataSet
),请按以下步骤操作:
WITH RECURSIVE TimeLine(Employee, ProjectID, ProjectStartDate, Date, DateRank) AS (
SELECT Employee, 1, Date, Date, DateRank
FROM DataSetWithRank
WHERE DateRank = 1
UNION ALL
SELECT T.Employee,
T.ProjectID + CASE When D.Date >= T.ProjectStartDate+14 THEN 1 Else 0 END,
CASE When D.Date >= T.ProjectStartDate+14 THEN D.Date Else T.ProjectStartDate END,
D.Date, D.DateRank
FROM TimeLine T
JOIN DataSetWithRank D ON D.Employee = T.Employee AND D.DateRank = T.DateRank + 1
), DataSet(Employee,Date) AS (
SELECT UNNEST(ARRAY['Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1']),
UNNEST(ARRAY['2018-01-01','2018-01-03','2018-01-05','2018-01-08','2018-01-11','2018-01-13','2018-01-14','2018-01-16','2018-01-18','2018-01-21','2018-01-22','2018-01-24','2018-01-25','2018-01-27','2018-01-29']::date[])
UNION
SELECT UNNEST(ARRAY['Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2']),
UNNEST(ARRAY['2018-01-03','2018-01-05','2018-01-07','2018-01-10','2018-01-13','2018-01-15','2018-01-16','2018-01-18','2018-01-20','2018-01-23','2018-01-24','2018-01-26','2018-01-27','2018-01-29','2018-01-31']::date[])
), DataSetWithRank AS (
SELECT *, DENSE_RANK() OVER (PARTITION BY Employee ORDER BY Date) AS DateRank
FROM DataSet
)
SELECT Employee,
'Project ' || ProjectID AS "Project #",
Date,
DENSE_RANK() OVER (PARTITION BY Employee, ProjectID ORDER BY Date) AS Rank,
CASE WHEN Date = ProjectStartDate THEN 'Y' ELSE NULL END AS Is_New
FROM TimeLine