需要一些改进/重写以下查询的建议。总而言之,我有一张表,我试图递归地遍历该表以生成父子关系。
例如,该表具有:
PAT_ID 1
+-------+------------------+------------------+
| EP_ID | START_DTTM | END_DTTM |
+-------+------------------+------------------+
| 1 | 01/12/2018 10:00 | 02/12/2018 15:00 |
| 2 | 03/12/2018 10:00 | 10/12/2018 15:00 |
| 3 | 04/12/2018 10:00 | 06/12/2018 15:00 |
| 4 | 07/12/2018 10:00 | 09/12/2018 15:00 |
| 5 | 11/12/2018 10:00 | 13/12/2018 15:00 |
| 6 | 12/12/2018 10:00 | 12/12/2018 15:00 |
| 7 | 01/12/2019 10:00 | 02/12/2019 15:00 |
+-------+------------------+------------------+
所需的输出:
+--------+-------+-----------+-----------------------------------------------------------------------------------------+
| PAT_ID | EP_ID | PARENT_ID | LINK_TYPE |
+--------+-------+-----------+-----------------------------------------------------------------------------------------+
| 1 | 1 | 0 | 'Parent' |
| 1 | 2 | 1 | 'Child' (Rule for child is that START_DTTM is less than 24 hours of parent EP_ID) |
| 1 | 3 | 2 | 'Inner' (Rule for inner is that START_DTTM is between START_DTTM and END_DTTM of Child) |
| 1 | 4 | 2 | 'Inner' |
| 1 | 5 | 0 | 'Parent' (doesnt qualify as child or inner for any row) |
| 1 | 6 | 5 | 'Child' |
| 1 | 7 | 0 | 'Parent |
+--------+-------+-----------+-----------------------------------------------------------------------------------------+
~~~ 我试图使用游标编写逻辑,该游标似乎返回的行很好,但是基表有超过1000万行,因此它不太可能在我退休之前完成,不幸的是还有30年了:)。在如何处理此查询方面需要社区的专家建议(我尝试了while循环,其速度比游标慢)。
谢谢!
IF (OBJECT_ID('tempdb..#PARENT') IS NOT NULL)
BEGIN
DROP TABLE #PARENT
END
IF (OBJECT_ID('tempdb..#CHILD') IS NOT NULL)
BEGIN
DROP TABLE #CHILD
END
CREATE TABLE #Parent (
EP_ID INT
,ID VARCHAR(20)
,PAT_ID VARCHAR(50)
,START_DTTM DATETIME
,END_DTTM DATETIME
,CT_DESC VARCHAR(100)
,CT_CODE VARCHAR(10)
,PARENT_EP_ID INT
,PARENT_ID VARCHAR(20)
,LINK VARCHAR(20)
,PROCESSED INT
,PARENT_EP_SEQ INT
)
CREATE TABLE #CHILD (
EP_ID INT
,ID VARCHAR(20)
,PAT_ID VARCHAR(50)
,START_DTTM DATETIME
,END_DTTM DATETIME
,CT_DESC VARCHAR(100)
,CT_CODE VARCHAR(10)
,PARENT_EP_ID INT
,PARENT_ID VARCHAR(20)
,LINK VARCHAR(20)
,PROCESSED INT
,CHILD_EP_SEQ INT
)
INSERT INTO #PARENT
SELECT deip.EP_ID
,deip.ID
,deip.PAT_ID
,START_DTTM
,END_DTTM
,CT_DESC
,CT_CODE
,0
,''
,'Parent' AS LINK
,0 AS PROCESSED
,row_number() OVER (
PARTITION BY deip.PAT_ID ORDER BY START_DTTM
) AS PARENT_EP_SEQ
FROM dbo.deip
INNER JOIN dbo.dEP ep ON deip.EP_ID = ep.EP_ID
dbo.RE ep.STATUS IN (
'A'
,'D'
)
AND ep.RECORD_STATUS = 'A'
AND
event_type = 'Active'
AND CT_CODE <> '10'
PRINT 'Parent Done'
DECLARE @PARENT_EP_SEQ INT
DECLARE @PAT_ID INT
DECLARE @EP_ID INT
DECLARE @COUNT BIGINT
DECLARE ChildCursor CURSOR LOCAL FAST_FORWARD
FOR
SELECT PARENT_EP_SEQ
,PAT_ID
,EP_ID
FROM #PARENT
where PROCESSED = 0
OPEN ChildCursor
while 1 = 1
BEGIN
-- And then fetch
FETCH NEXT
FROM ChildCursor
INTO @PARENT_EP_SEQ
,@PAT_ID
,@EP_ID
-- And then, if no row is fetched, exit the loop
IF @@fetch_status <> 0
BEGIN
BREAK
END
INSERT INTO #CHILD
SELECT C.EP_ID
,C.ID
,P.PAT_ID
,C.START_DTTM
,C.END_DTTM
,C.CT_DESC
,C.CT_CODE
,P.EP_ID AS PARENT_EP_ID
,P.ID
,'Child' AS LINK
,0 AS PROCESSED
,row_number() OVER (
PARTITION BY C.PAT_ID ORDER BY c.START_DTTM
) AS CHILD_EP_SEQ
FROM #PARENT p
INNER JOIN #PARENT C ON p.PAT_ID = c.PAT_ID
dbo.RE P.PAT_ID = @PAT_ID
AND P.EP_ID = @EP_ID
AND P.PARENT_EP_SEQ = @PARENT_EP_SEQ
AND P.EP_ID <> C.EP_ID
AND P.PARENT_EP_SEQ <> C.PARENT_EP_SEQ
AND datediff(hh, isnull(p.END_DTTM, getdate()), C.START_DTTM) BETWEEN 0
AND 24
AND p.PROCESSED = 0
AND c.CT_CODE <> '10'
ORDER BY p.PARENT_EP_SEQ
DELETE P
FROM #PARENT P
INNER JOIN #CHILD c ON p.PAT_ID = c.PAT_ID
AND p.EP_ID = c.EP_ID
UPDATE #PARENT
SET Processed = 1
dbo.RE PAT_ID = @PAT_ID
AND EP_ID = @EP_ID
AND PARENT_EP_SEQ = @PARENT_EP_SEQ
END
CLOSE ChildCursor
DEALLOCATE ChildCursor
PRINT 'Child Done'
经过思考:我曾考虑使用递归/分层CTE,但是我没有确定关系的键。父母与孩子的关联就是我想要产生的。
答案 0 :(得分:0)
您可以对CURSOR方法进行多线程处理,因为这听起来像是一次性的,而不是一遍又一遍地要做的事情。
使用过滤器编辑您的CURSOR代码,该过滤器将在大约50万行上运行,启动它,打开另一个窗口,然后添加一个过滤器,该过滤器将在500,001-1,000,000行上启动,等等。
我敢打赌,它将在您针对此逻辑提出基于CTE /集合的方法之前完成。