TSQL - 窗口功能/排名

时间:2016-12-08 20:15:32

标签: sql sql-server tsql

我试图找到所有错过3次或更多连续约会的人。我相信我可以使用窗口功能来实现这一目标,但我陷入困境并寻求一些帮助。

以下是我要寻找的样本: 对于以下PATID = x001符合缺失1 / 12,1 / 14,1 / 15的标准 并且PATID = x002不符合标准。

PATID  DEPT   DATE       STATUS
x001   A002   1/1/2016   Missed
x001   A002   1/5/2016   Complete
x001   A002   1/8/2016   Missed
x001   A002   1/10/2016   Complete
x001   A002   1/12/2016   Missed
x001   A002   1/14/2016   Missed
x001   A002   1/15/2016   Missed
x001   A002   1/19/2016   Complete

x002   A003   1/1/2016   Missed
x002   A003   1/5/2016   Complete
x002   A003   1/8/2016   Missed
x002   A003   1/10/2016   Complete
x002   A003   1/12/2016   Missed
x002   A003   1/14/2016   Complete
x002   A003   1/15/2016   Missed
x002   A003   1/19/2016   Complete

这是我到目前为止所拥有的。

SELECT 
PR.PATID
, PR.DEPT
, PR.DATE
, CASE WHEN PR.STATUS IN (3,4) THEN 'Cancel' WHEN PR.STATUS = 2 THEN 'COMPLETED' ELSE 'ERROR' END AS STATUS, ROW_NUMBER () OVER (PARTITION BY PR.PAT_ID,PR.DEPARTMENT_ID ORDER BY R.PAT_ID,PR.DEPARTMENT_ID,PR.CONTACT_DATE) AS RN  -- Just numbers the rows
, COUNT(*) OVER (PARTITION BY PR.PAT_ID,PR.DEPARTMENT_ID, CASE WHEN PR.APPT_STATUS_C IN (3,4) AND PR.CANCEL_REASON_C <> 4 THEN 'Cancel'                  WHEN PR.APPT_STATUS_C = 2 THEN 'COMPLETED' ELSE 'ERROR' END) AS RNC  -- Should have break at new statuses
FROM #PatsReturn AS PR

一旦我弄清楚如何在新的状态更改中按日期分配正确的中断,那么我需要找出一种方法来识别(可能是一个新的标志字段??)已经错过3+连续的PATID。 ..

非常感谢任何帮助。

谢谢,

2 个答案:

答案 0 :(得分:2)

一种方法就是使用lag() / lead()

select distinct patid
from (select pr.*,
             lead(status) over (partition by patid order by date) as status_1,
             lead(status, 2) over (partition by patid order by date) as status_2
      from #PatsReturn pr
     ) pr
where status = 'Missed' and status_1 = 'Missed' and status_2 = 'Missed';

如果您需要有关每个序列的信息,可以识别类似约会的组。一种方法是行数的差异:

select patid, count(*) as numMissed, min(date), max(date)
from (select pr.*,
             row_number() over (partition by patid order by date) as seqnum_p,
             row_number() over (partition by patid, status order by date) as seqnum_ps
      from #PatsReturn pr
     ) pr
where status = 'Missed'
group by patid, (seqnum_p - seqnum_ps)
having count(*) >= 3;

要了解其工作原理,请运行子查询,您可以看到两个&#34; seqnum&#34;对于相同状态的序列,值是常量。

答案 1 :(得分:0)

如果你选择戈登的第一个解决方案,你会想要一个POC指数。说这是你的桌子:

CREATE TABLE #PatsReturn (PATID char(4), DEPT char(4), [date] date, [status] varchar(10));

你会想要这个索引:

CREATE UNIQUE INDEX nc_PatsReturn  ON #PatsReturn(PATID, [date]) INCLUDE ([status]);

这将确保一个不错的线性无排序查询执行计划。索引的第二个解决方案有点棘手。