根据每个ID每行的逻辑创建失效概念

时间:2018-09-25 15:10:33

标签: sql amazon-redshift

我正在尝试达到lapsed_date的时间,即给定的ID在以下之间有> 12周(即84天)的时间:

1)onboarded_at和current_date(如果不存在applied_at)-这意味着如果> 84天,则lapsed_now

2)onboarded_at和min(applied_at)(如果存在)

3)每个连续的applied_at

4)max({applied_at)和current_date-如果超过84天,则表示lapsed_now

如果他有多个逝世时间,那么我们只会显示最近的逝世日期。

我尝试过的方法适用于大多数情况,但不适用于所有情况。您能协助使其普遍运作吗?

样本集:

CREATE TABLE #t
(
  id VARCHAR(10),
  rank INTEGER,
  onboarded_at DATE,
  applied_at DATE
  );

INSERT INTO #t VALUES
('A',1,'20180101','20180402'),
('A',2,'20180101','20180403'),
('A',3,'20180101','20180504'),
('B',1,'20180201','20180801'),
('C',1,'20180301','20180401'),
('C',2,'20180301','20180501'),
('C',3,'20180301','20180901'),
('D',1,'20180401',null)

最佳尝试:

SELECT onb.id,
onb.rank,
onb.onboarded_at,
onb.applied_at,
onb.lapsed_now,
CASE WHEN lapsed_now = 1 OR lapsed_previous = 1
    THEN 1
    ELSE 0
END lapsed_ever,
CASE WHEN lapsed_now = 1
    THEN DATEADD(DAY, 84, lapsed_now_date)
    ELSE min_applied_at_add_84
END lapsed_date
FROM
(SELECT *,
CASE
    WHEN DATEDIFF(DAY, onboarded_at, MIN(ISNULL(applied_at, onboarded_at)) over (PARTITION BY id)) >= 84
        THEN 1
    WHEN DATEDIFF(DAY, MAX(applied_at) OVER (PARTITION BY id), GETDATE()) >= 84
        THEN 1
    ELSE 0
END lapsed_now,
CASE
    WHEN MAX(DATEDIFF(DAY, onboarded_at, ISNULL(applied_at, GETDATE()))) OVER (PARTITION BY id) >= 84
        THEN 1
    ELSE 0
END lapsed_previous,
MAX(applied_at) OVER (PARTITION BY id) lapsed_now_date,
DATEADD(DAY, 84, MIN(CASE WHEN applied_at IS NULL THEN onboarded_at ELSE applied_at END) OVER (PARTITION BY id)) min_applied_at_add_84
FROM #t
) onb

当前解决方案:

id  rank    onboarded_at    applied_at  lapsed_now  lapsed_ever lapsed_date
A   1       2018-01-01      2018-04-02  1           1           2018-07-27
A   2       2018-01-01      2018-04-03  1           1           2018-07-27
A   3       2018-01-01      2018-05-04  1           1           2018-07-27
B   2       2018-02-01      2018-08-01  1           1           2018-10-24
C   1       2018-03-01      2018-04-01  0           1           2018-06-24
C   2       2018-03-01      2018-05-01  0           1           2018-06-24
C   3       2018-03-01      2018-09-01  0           1           2018-06-24
D   1       2018-04-01      null        1           1           2018-06-24

期望的解决方案:

id  rank    onboarded_at    applied_at  lapsed_now  lapsed_ever lapsed_date
A   1       2018-01-01      2018-04-02   1           1         2018-07-27 (not max lapsed date)
A   2       2018-01-01      2018-04-03   1           1         2018-07-27
A   3       2018-01-01      2018-05-04   1           1         2018-07-27 (May 4 + 84)
B   1       2018-02-01      2018-08-01   0           1         2018-04-26 (Feb 1 + 84)
C   1       2018-03-01      2018-04-01   0           1         2018-07-24 
C   2       2018-03-01      2018-05-01   0           1         2018-07-24 (May 1 + 84)
C   3       2018-03-01      2018-09-01   0           1         2018-07-24 
D   1       2018-04-01      null         1           1         2018-06-24

2 个答案:

答案 0 :(得分:2)

这里有些猜测,但是希望可以解决问题:

SELECT res.id,
res.rank,
res.onboarded_at,
res.applied_at,
res.lapsed_now,
CASE WHEN lapsed_now = 1 OR lapsed_previous = 1
    THEN 1
    ELSE 0
END lapsed_ever,
CASE
  WHEN lapsed_now = 1
    THEN DATEADD(DAY, 84, lapsed_now_date)
  WHEN applied_difference_gt84 IS NOT NULL
    THEN DATEADD(DAY, 84, applied_difference_gt84)
  WHEN DATEDIFF(DAY, min_applied_at_add_84, GETDATE()) < 84
    THEN DATEADD(DAY, 84, onboarded_at)
    ELSE min_applied_at_add_84
END lapsed_date
FROM (
SELECT *, MAX(applied_difference) OVER (PARTITION BY id ORDER BY rank ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) applied_difference_gt84
FROM
(
SELECT *,
CASE
    WHEN DATEDIFF(DAY, onboarded_at, MIN(ISNULL(applied_at, onboarded_at)) over (PARTITION BY id)) >= 84
          AND DATEDIFF(DAY, MAX(applied_at) OVER (PARTITION BY id), GETDATE()) >= 84
        THEN 1
    WHEN DATEDIFF(DAY, ISNULL(MAX(applied_at) OVER (PARTITION BY id), onboarded_at), GETDATE()) >= 84
        THEN 1
    ELSE 0
END lapsed_now,
CASE
    WHEN MAX(DATEDIFF(DAY, onboarded_at, ISNULL(applied_at, GETDATE()))) OVER (PARTITION BY id) >= 84
        THEN 1
    ELSE 0
END lapsed_previous,
 CASE
  WHEN DATEDIFF(MONTH, applied_at, LEAD(applied_at, 1) OVER (PARTITION BY id ORDER BY rank)) >= 2
   THEN applied_at
 ELSE NULL
 END applied_difference,
ISNULL(MAX(applied_at) OVER (PARTITION BY id), onboarded_at) lapsed_now_date,
DATEADD(DAY, 84, MIN(CASE WHEN applied_at IS NULL THEN onboarded_at ELSE applied_at END) OVER (PARTITION BY id)) min_applied_at_add_84
FROM #t
) onb
  ) res

结果:

id  rank    onboarded_at    applied_at  lapsed_now  lapsed_ever lapsed_date
A   1       2018-01-01      2018-04-02  1           1           2018-07-27
A   2       2018-01-01      2018-04-03  1           1           2018-07-27
A   3       2018-01-01      2018-05-04  1           1           2018-07-27
B   1       2018-02-01      2018-08-01  0           1           2018-04-26
C   1       2018-03-01      2018-04-01  0           1           2018-07-24
C   2       2018-03-01      2018-05-01  0           1           2018-07-24
C   3       2018-03-01      2018-09-01  0           1           2018-07-24
D   1       2018-04-01      (null)      1           1           2018-06-24

由于需要计算apply_at日期之间的差异,因此有点混乱。

答案 1 :(得分:1)

@Jim,根据您的回答,我创建了以下解决方案。 我认为这很容易理解,而且很直观,知道失效的标准:

SELECT id, onboarded_at, applied_at, 
max(case when (zero_applicants is not null and current_date - onboarded_at > 84) or (last_applicant is not null and current_date - last_applicant > 84) then 1 else 0 end) over (partition by id) lapsed_now,
max(case when (zero_applicants is not null and current_date - onboarded_at > 84) or (one_applicant is not null and applied_at - onboarded_at > 84)
     or (one_applicant is not null and current_date - applied_at > 84) or (next_applicant is not null and next_applicant- applied_at > 84)
     or (last_applicant is not null and current_date - last_applicant > 84) then 1 else 0 end) over(partition by id) lapsed_ever,
max(case when zero_applicants is not null and current_date - onboarded_at > 84 then onboarded_at + 84 
     when one_applicant is not null and applied_at - onboarded_at > 84 then onboarded_at + 84 
     when one_applicant is not null and current_date - applied_at > 84 then applied_at + 84 
     when next_applicant is not null and next_applicant - applied_at > 84 then applied_at + 84 
     when last_applicant is not null and current_date - last_applicant > 84 then last_applicant + 84 
     end) over (partition by id) lapsed_date
from (
select *, 
case when MAX(applied_at) OVER (PARTITION BY id) is null then onboarded_at end as zero_applicants,
case when count(applied_at) over(partition by id)=1 then onboarded_at end as one_applicant,
case when count(applied_at) over(partition by id)>1 then LEAD(applied_at, 1) OVER (PARTITION BY id ORDER BY applied_at) end as next_applicant,
case when LEAD(applied_at, 1) OVER (PARTITION BY id ORDER BY applied_at) is null then MAX(applied_at) over(partition by id) end as last_applicant
from #t
) res
order by id, applied_at