以下是示例数据。如果他们在45天之内有4次或更多次就诊,我只需要保留一位患者。我已经转换了数据集并使用数组来找出一种方法,但我希望有一种更有效的方法。
Pat_ID Date Prov_ID
A 05/12/2012 X1
A 05/12/2012 X2
B 11/12/2012 X1
B 11/20/2012 X1
B 01/12/2013 X1
B 03/22/2013 X1
C 04/25/2013 X1
C 04/25/2013 X2
C 04/27/2013 X1
C 05/12/2013 X1
C 05/22/2013 X2
C 04/25/2012 X3
...
我开始删除少于4个事件的观察。
任何想法都会受到赞赏。
结束结果应该是一个数据集,其中只有PAT_ID在45天内有4次或更多次访问。
答案 0 :(得分:5)
这是使用lag
函数的基于SAS(而不是SQL)的解决方案。它只读取一次数据,所以应该非常有效,特别是与自我式的解决方案相比。
首先按ID和访问日期对数据进行排序(如果它还没有)
proc sort data=YourData;
by Pat_ID Date;
run;
如果您追溯3条记录中的Pat_ID和日期,您可以测试它是否在45天内并且是同一位患者。如果是这样 - 将其添加到列表中。
data list_of_membs(keep=PAT_ID);
set YourData;
retain last_Pat_ID;*The last PAT_ID that was added to the list;
pat_id_3back = lag3(Pat_ID); *PAT_ID from 3 records back;
date_3back = lag3(date);
If pat_id = pat_id_3back
AND (date - date_3back) < 45
AND (PAT_ID != last_Pat_ID) THEN DO;
output;
last_PAT_ID = PAT_ID;
END;
run;
答案 1 :(得分:0)
我认为你应该能够通过自我加入结果并计算同一患者在该行日期的45天内发生日期的次数来实现这一目标。
以下代码未经测试,但希望能让您朝着正确的方向前进。
WITH X AS (
--your original query here
)
SELECT
X.*,
COUNT(Y.Date) over (PARTITION BY X.Pat_ID, X.VisitID) as [# of visits within 45 days after this one]
INTO #Y
FROM X
INNER JOIN X as Y on X.Pat_ID = Y.Pat_ID and Y.Date > X.Date and Y.Date < DATEADD(d,45,X.Date)
SELECT DISTINCT
*
FROM #Y
WHERE [# of visits within 45 days after this one] >= 3
DROP TABLE #Y
答案 2 :(得分:0)
编辑:当我第一次回答这个问题时,我认为结果应该是原始表中彼此相隔45天的记录。我现在已经包含了Tim Sand的答案的SQL Server实现(一个没有自我加入),以防将来有人需要SQL Server版本。我认为值得一提的是,如果除了pat_id之外需要从表中获取任何数据,那么它将需要加入表中才能获得它。
DECLARE @PatientVisits TABLE
(
VisitID INT PRIMARY KEY IDENTITY
,Pat_ID VARCHAR(5)
,Date DATETIME2
,Prov_ID VARCHAR(5)
)
INSERT INTO @PatientVisits
VALUES
('A' ,'05/12/2012' ,'X1')
,('A' ,'05/12/2012' ,'X2')
,('B' ,'11/12/2012' ,'X1')
,('B' ,'11/20/2012' ,'X1')
,('B' ,'01/12/2013' ,'X1')
,('B' ,'03/22/2013' ,'X1')
,('C' ,'04/25/2013' ,'X1')
,('C' ,'04/25/2013' ,'X2')
,('C' ,'04/27/2013' ,'X1')
,('C' ,'05/12/2013' ,'X1')
,('C' ,'05/22/2013' ,'X2')
,('C' ,'04/25/2012' ,'X3');
SELECT DISTINCT
Pat_ID
FROM
(
SELECT
Pat_ID
,Date AS CurrentDate
,LAG (Date,3,'00:00') OVER ( PARTITION BY Pat_ID ORDER BY Date ASC ) AS DateThreeVisitsAgo
FROM @PatientVisits
) Visits
WHERE
DATEDIFF(DAY,DateThreeVisitsAgo,CurrentDate) <= 45
这是我原来的答案:
DECLARE @PatientVisits TABLE
(
VisitID INT PRIMARY KEY IDENTITY
,Pat_ID VARCHAR(5)
,Date DATETIME2
,Prov_ID VARCHAR(5)
)
INSERT INTO @PatientVisits
VALUES
('A' ,'05/12/2012' ,'X1')
,('A' ,'05/12/2012' ,'X2')
,('B' ,'11/12/2012' ,'X1')
,('B' ,'11/20/2012' ,'X1')
,('B' ,'01/12/2013' ,'X1')
,('B' ,'03/22/2013' ,'X1')
,('C' ,'04/25/2013' ,'X1')
,('C' ,'04/25/2013' ,'X2')
,('C' ,'04/27/2013' ,'X1')
,('C' ,'05/12/2013' ,'X1')
,('C' ,'05/22/2013' ,'X2')
,('C' ,'04/25/2012' ,'X3');
SELECT
VisitID
,Pat_ID
,Date
,Prov_ID
FROM
(
SELECT
P.VisitID
,P.Pat_ID
,P.Date
,P.Prov_ID
,COUNT(*) OVER (PARTITION BY P.VisitID) AS NearVisitCount
FROM @PatientVisits P
JOIN @PatientVisits P2
ON P.Pat_ID = P2.Pat_ID
AND P.Date BETWEEN DATEADD(DAY,-45,P2.Date) AND DATEADD(DAY,45,P2.Date)
) Visits
WHERE
NearVisitCount >= 4
GROUP BY
VisitID
,Pat_ID
,Date
,Prov_ID
结果:
VisitID Pat_ID Date Prov_ID
7 C 2013-04-25 X1
8 C 2013-04-25 X2
9 C 2013-04-27 X1
10 C 2013-05-12 X1
11 C 2013-05-22 X2
答案 3 :(得分:0)
这个也适用:
proc sql;
create table want as
select a.Pat_ID, a.Date, a.Prov_ID from YourData as a
left join YourData as b
on
(
a.pat_id = b.pat_id
and intck('day',b.date,a.date)>=45
)
join
(select pat_id, count(*) from YourData
group by pat_id
having count(*)>4
) as c
on
(
a.pat_id = c.pat_id
)
where b.date is not missing;
quit;