我有EMP表如下:
CREATE TABLE EMP
(
[ID] INT NOT NULL PRIMARY KEY,
[MGR_ID] INT,
[DEPT_ID] INT,
[NAME] VARCHAR(30),
[SAL] INT,
[DOJ] DATE
);
我需要检索emp表中任何重复数据行的报告以及重复数据行的次数。
我部分解决了这个问题:
此查询返回每个重复行的单个实例
SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
from EMP
group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
having count(*) > 1
输出将是:
MGR_ID DEPT_ID NAME SAL DOJ
NULL 2 Hash 100 2012-01-01
1 2 Robo 100 2012-01-01
2 1 Privy 50 2012-05-01
我仍然需要按照EMP表中每个行重复的次数对此输出进行分组。
我试过了:
WITH CTE
AS
(
SELECT * from EMP A
join ( SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
from EMP
group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
having count(*) > 1 ) B
on a.[MGR_ID] = b.[MGR_ID]
OR a.[MGR_ID] != b.[MGR_ID]
AND a.[DEPT_ID] = b.[DEPT_ID]
AND a.[NAME] = b.[NAME]
AND a.[SAL] = b.[SAL]
AND a.[DOJ] = b.[DOJ]
)
SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ], DENSE_RANK() OVER
(PARTITION BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] ORDER BY DUPICATES) AS [DUPLICATES]
FROM CTE
但我收到了这个错误:
Msg 8156,Level 16,State 1,Line 1
“CTE”多次指定了“MGR_ID”列。
请帮忙。
解决方案部分被找到,除了我还需要在输出中返回3个记录的MRG_ID列,其中它是= NULL
with cte as
(
SELECT A.[DEPT_ID],A.[NAME],A.[SAL],A.[DOJ] from EMP A
join ( SELECT [DEPT_ID],[NAME],[SAL],[DOJ]
from EMP
group by [DEPT_ID],[NAME],[SAL],[DOJ]
having count(*) > 1 ) B
ON a.[DEPT_ID] = b.[DEPT_ID]
AND a.[NAME] = b.[NAME]
AND a.[SAL] = b.[SAL]
AND a.[DOJ] = b.[DOJ]
)
SELECT [DEPT_ID],[NAME],[SAL],[DOJ], DENSE_RANK() OVER
(PARTITION BY [NAME] ORDER BY [NAME] DESC) AS [DUPLICATES], RANK() OVER
(PARTITION BY [NAME] ORDER BY [NAME] DESC) AS [SimpleRank]
FROM CTE
DEPT_ID NAME SAL DOJ DUPLICATES SimpleRank
2 Hash 100 2012-01-01 1 1
2 Hash 100 2012-01-01 1 1
2 Hash 100 2012-01-01 1 1
1 Privy 50 2012-05-01 1 1
1 Privy 50 2012-05-01 1 1
1 Privy 50 2012-05-01 1 1
2 Robo 100 2012-01-01 1 1
2 Robo 100 2012-01-01 1 1
2 Robo 100 2012-01-01 1 1
很多
最终解决方案似乎更容易:
Select [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ], count(name) From EMP group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] having Count(Name) >1
它生成此结果集
MGR_ID DEPT_ID NAME SAL DOJ Count_Of_ Duplicated_Rows
NULL 2 Hash 100 2012-01-01 3
1 2 Robo 100 2012-01-01 3
2 1 Privy 50 2012-05-01 3
注意:仅当您按重复列进行分组时,此选项才有效。
下面的示例基于之前更复杂的查询,但它验证了行中的所有字段,与上面检查您正在对查询进行分组的特定列的条件的简单查询进行比较。
WITH CTE
AS
(
SELECT A.[MGR_ID], A.[DEPT_ID], A.[NAME], A.[SAL], A.[DOJ]
FROM EMP A
JOIN (SELECT [MGR_ID], [DEPT_ID], [NAME], [SAL], [DOJ]
FROM EMP
GROUP BY [MGR_ID], [DEPT_ID], [NAME], [SAL], [DOJ]
HAVING count(*) > 1) B
ON a.[MGR_ID] = b.[MGR_ID]
AND a.[DEPT_ID] = b.[DEPT_ID]
AND a.[NAME] = b.[NAME]
AND a.[SAL] = b.[SAL]
AND a.[DOJ] = b.[DOJ]
)
SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ],
count(*) As Count_Of_Duplicated_Rows
FROM EMP
GROUP BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
--HAVING Count(*) >1
答案 0 :(得分:0)
您的问题是您没有明确命名CTE中的选定列。由于EMP
和子查询都有一个名为MGR_ID
的列,因此在联接上执行select *
会返回列MGR_ID
两次。根据{{3}},这是不允许的:
只有在查询定义中提供了所有结果列的不同名称时,列名列表才是可选的。
请注意,对于连接两侧存在的每对列,您将遇到相同的错误。要解决此问题,您可以在列列表中明确命名CTE返回的列,并使用重复列的别名,如下所示:
WITH CTE (mgr_id,dept_id,name,sal,doj,mgr_id2,...) //mgr_id2 is an alias for b.mgr_id
AS
...
您可以参考此MSDN进行演示。删除列列表,您将看到现在看到的相同错误。
或者,您可以指定要在CTE中选择的列,我建议您这样做,因为您实际上不需要查询中的任何重复列:
;with cte as
(
SELECT A.[MGR_ID],A.[DEPT_ID],A.[NAME],A.[SAL],A.[DOJ] from EMP A
join ( SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
from EMP
group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
having count(*) > 1 ) B
...
答案 1 :(得分:0)
试试这个
WITH CTE
AS
(
SELECT a.* from EMP A
join ( SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
from EMP
group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
having count(*) > 1 ) B
on a.[MGR_ID] = b.[MGR_ID]
--OR a.[MGR_ID] != b.[MGR_ID]
AND a.[DEPT_ID] = b.[DEPT_ID]
AND a.[NAME] = b.[NAME]
AND a.[SAL] = b.[SAL]
AND a.[DOJ] = b.[DOJ]
),cte2 as(
SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ], DENSE_RANK() OVER
(PARTITION BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] ORDER BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]) AS [DUPLICATES]
FROM CTE )
select [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] from cte2 where DUPLICATES=1