我把我认为过于复杂的SQL放在一起,以达到我所追求的目标。我希望能够深入了解一种更快速,更简单的方法。
我所追求的是能够为两组中存在共同数据组的数据组分配ID。
例如,我有以下数据子集:
CustID PartID RplcID
28 4 4
28 4 16
28 4 17
28 16 4
28 16 16
28 16 17
28 17 4
28 17 16
28 17 17
我想为CustID = 28创建一个ID,其中RplcID和PartID存在重叠。因此,在此示例中,PartID 4,16,17都具有共同的RplcID(4,16,17)。因此,所有这些对都应具有相同的ID。
我正在使用的方法(并且使用临时表而不是仅使用CTE更快)除了大数据集之外,这个东西是S-L-O-W。我确信那里有一种更有效的方法,希望有人可以提供他们的专业知识。
我正在概述我目前的做法,尽可能清楚地了解我的混乱思想。
第1步
使用由CustID划分的DENSE_RANK()
生成临时ID,按PartID排序。
RowID CustID PartID RplcID
1 28 16 16
1 28 17 16
1 28 4 16
2 28 16 17
2 28 17 17
2 28 4 17
3 28 16 4
3 28 17 4
3 28 4 4
第2步: 然后使用这些结果并使用XML聚合PartID,以创建用于分组的逗号分隔字符串。
RowID CustID RplcID PartIDS
4 28 16 16,17,4
4 28 17 16,17,4
4 28 4 16,17,4
第3步: 最后,通过解析XML,使用分配的ID拆分这些组。
RowID CustID PartID RplcID
4 28 16 16
4 28 16 17
4 28 16 4
4 28 17 16
4 28 17 17
4 28 17 4
4 28 4 16
4 28 4 17
4 28 4 4
整个SQL:
DECLARE @Parts TABLE
(
CustID VARCHAR(10),
PartID VARCHAR(10),
RplcID VARCHAR(10)
)
Insert Into @Parts VALUES
('26','19','93'),('26','19','63'),
('26','31','93'),('26','31','63'),('26','32','93'),('26','32','63'),('26','33','93'),('26','33','63'),('26','34','93'),
('26','34','63'),('26','35','93'),('26','35','63'),('26','36','93'),('26','36','63'),('26','37','93'),('26','37','63'),
('26','38','93'),('26','38','63'),('26','39','93'),('26','39','63'),('27','40','95'),('27','41','94'),
('27','41','95'),('27','42','94'),('27','42','95'),('27','43','94'),('27','43','95'),('27','44','94'),('27','44','95'),
('27','45','94'),('27','45','95'),('27','46','94'),('27','46','95'),('27','47','94'),('27','47','95'),('27','48','94'),
('27','48','95'),('27','49','94'),('27','49','95'),('27','50','94'),('27','50','95'),('27','17','94'),('27','17','95'),
('27','51','94'),('27','51','95'),('27','52','94'),('27','52','95'),('27','53','94'),('27','53','95'),('27','54','94'),
('27','54','95'),('27','33','94'),('27','33','95'),('27','55','94'),('27','55','95'),('27','34','94'),('27','34','95'),
('27','56','94'),('27','56','95'),('27','35','94'),('27','35','95'),('27','57','94'),('27','57','95'),('27','58','94'),
('27','58','95'),('27','59','94'),('27','59','95'),('27','37','94'),('27','37','95'),('27','60','94'),('27','60','95'),
('27','61','94'),('27','61','95'),('27','62','94'),('27','62','95'),('27','63','94'),('27','63','95'),('27','64','94'),
('27','64','95'),('27','3','96'),('27','3','97'),('27','3','98'),('27','3','99'),('27','3','100'),('28','4','4'),
('28','4','16'),('28','4','17'),('28','16','4'),('28','16','16'),('28','16','17'),('28','17','4'),('28','17','16'),
('28','17','17')
;
--Step 1: Create the initial ID
SELECT DISTINCT DENSE_RANK()
OVER(
partition BY r.CustID
ORDER BY r2.RplcID) AS RowID,
r.CustID,
r.BuyID,
r2.RplcID
INTO #tmp
FROM @Parts r
JOIN @Parts r1
ON r.CustID = r1.CustID
AND r.RplcID = r1.RplcID
JOIN @Parts r2
ON r.CustID = r2.CustID
AND r1.BuyID = r2.BuyID
--Step 2: Group the BuyIDs
SELECT DENSE_RANK()
OVER(
ORDER BY CustID, BuyIDs) AS RowID,
*
INTO #tmp2
FROM (SELECT CustID,
Rtrim(RplcID) RplcID,
Stuff((SELECT ',' + Rtrim(BuyID)
FROM #tmp RSLT2
WHERE RSLT2.ROWID = RSLT.ROWID
AND RSLT2.CustID = RSLT.CustID
FOR xml path('')), 1, 1, '') [BuyIDs]
FROM #tmp RSLT
GROUP BY RSLT.CustID,
RSLT.ROWID,
RSLT.RplcID)A
--Step 3: Using the grouped BuyIDs, split the strings using XML and assign RowID
SELECT RowID,
CustID,
BuyID,
RplcID
INTO #tmp3
FROM (SELECT RowID,
CustID,
n.r.value('.','varchar(10)') AS BuyID,
RplcID
FROM #tmp2
CROSS APPLY(SELECT Cast('<r>' + Replace(BuyIDs, ',', '</r><r>')
+ '</r>' AS XML)) AS S(xmlcol)
CROSS APPLY s.xmlcol.nodes('r') AS n(r))A
Order by RowID
Select * from #tmp3 where CustID='28'
Select distinct BuyID
from #tmp3
where CustID='28'
Select distinct RplcID
from #tmp3
where CustID='28'