我有一个~400,000行表,其中包含约30,000人收集约会的日期。每行都有患者ID号和约会日期。我想有效地选择在8周内至少有4次约会的人。理想情况下,我也会标记这8周内的约会。我在不允许CLR聚合函数的服务器环境中工作。这可以在SQL服务器中执行吗?如果是这样,怎么样?
我曾经想过:
为了清楚讨论,这里有一些符号:
MyTable的:
ApptID PatientID ApptDate (in smalldatetime)
--------------------------------------------------
Apt1 Pt1 Datetime1
Apt2 Pt1 Datetime2
Apt3 Pt2 Datetime3
... ... ...
期望的输出(一个选项):
PatientID 4aptsIn8weeks? (Boolean) InitialApptDateForWin
Pt1 1 Datetime1
Pt2 0 NULL
Pt3 1 Datetime3
...
期望的输出(另一种选择):
ApptID PatientID ApptDate InAn8wkWindow? InitialApptDateForWin
Apt1 Pt1 Datetime1 1 Datetime1
Apt2 Pt1 Datetime2 1 Datetime1
Apt3 Pt2 Datetime3 0 NULL
... ... ...
但实际上,任何最终让我选择符合此标准的患者和约会的输出格式都会花花公子....
感谢您的任何想法!
编辑:这里有一个略微解压缩的下面我选择的答案的实现大纲,以防万一细节对其他人有用(对SQL很新,我花了几个小时才能让它工作):
WITH MyTableAlias AS (
SELECT * FROM MyTable
)
SELECT MyTableAlias.PatientID, MyTable.Apptdate AS V1,
MyTableAlias.Apptdate AS V2
INTO temp1
FROM MyTable INNER JOIN MyTableAlias
ON (
MyTable.PatientID = MyTableAlia.PatientID
AND (DATEDIFF(Wk,MyTable.Apptdate,MyTableAlias.Apptdate) <=8 )
);
-- Since this gives for any given two visit dates 3 hits
-- (V1-V1, V1-V2, V2-V2), delete the ones where the second visit is being
-- selected as V1:
DELETE FROM temp1
WHERE V2<V1;
-- So far we have just selected pairs of visits within an 8 week
-- span of each other, including an entry for each visit being
-- within 8 weeks of itself, but for the rest only including the item
-- where the second visit is after the first. Now we want to look
-- for examples of first visits where there are at least 4 hits:
SELECT PatientID, V1, MAX(V2) AS lastvisitinspan, DATEDIFF(Wk,V1,MAX(V2))
AS nWeeksInSpan, COUNT(*) AS nWeeksInSpan
INTO MyOutputTable
FROM temp
GROUP BY PatientID, V1
HAVING COUNT(*)>3;
-- From here on it's just a matter of how I want to handle patients with two
-- separate V1 examples meeting criteria...
答案 0 :(得分:0)
查询的粗略轮廓:
但是有一些问题: