检索emp表中任何重复数据行的报告以及重复数据行的次数

时间:2015-12-09 06:07:22

标签: sql sql-server

我有EMP表如下:

CREATE TABLE EMP
(
[ID] INT NOT NULL PRIMARY KEY,
[MGR_ID] INT, 
[DEPT_ID] INT, 
[NAME] VARCHAR(30), 
[SAL] INT, 
[DOJ] DATE
);

我需要检索emp表中任何重复数据行的报告以及重复数据行的次数。

我部分解决了这个问题:

此查询返回每个重复行的单个实例

SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
          from EMP 
          group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] 
         having count(*) > 1 

输出将是:

MGR_ID  DEPT_ID NAME    SAL DOJ
NULL    2       Hash    100 2012-01-01
1       2       Robo    100 2012-01-01
2       1       Privy   50  2012-05-01

我仍然需要按照EMP表中每个行重复的次数对此输出进行分组。

我试过了:

WITH CTE
AS 
(
SELECT * from EMP A
  join ( SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
           from EMP 
          group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] 
         having count(*) > 1 ) B
   on  a.[MGR_ID] = b.[MGR_ID]
   OR a.[MGR_ID] != b.[MGR_ID]
   AND a.[DEPT_ID] = b.[DEPT_ID]
   AND a.[NAME] = b.[NAME]
   AND a.[SAL] = b.[SAL]
   AND a.[DOJ] = b.[DOJ]
   )

   SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ], DENSE_RANK() OVER
   (PARTITION BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] ORDER BY DUPICATES) AS [DUPLICATES] 
   FROM CTE 

但我收到了这个错误:

  

Msg 8156,Level 16,State 1,Line 1
  “CTE”多次指定了“MGR_ID”列。

请帮忙。

解决方案部分被找到,除了我还需要在输出中返回3个记录的MRG_ID列,其中它是= NULL

 with cte as
  (
SELECT A.[DEPT_ID],A.[NAME],A.[SAL],A.[DOJ] from EMP A
  join ( SELECT [DEPT_ID],[NAME],[SAL],[DOJ]
           from EMP 
           group by [DEPT_ID],[NAME],[SAL],[DOJ] 
           having count(*) > 1 ) B

   ON a.[DEPT_ID] = b.[DEPT_ID]
   AND a.[NAME] = b.[NAME]
   AND a.[SAL] = b.[SAL]
   AND a.[DOJ] = b.[DOJ]
   )

   SELECT [DEPT_ID],[NAME],[SAL],[DOJ], DENSE_RANK() OVER
   (PARTITION BY [NAME] ORDER BY [NAME] DESC) AS [DUPLICATES], RANK() OVER
   (PARTITION BY [NAME] ORDER BY [NAME] DESC) AS [SimpleRank]
   FROM CTE 


DEPT_ID NAME    SAL DOJ        DUPLICATES   SimpleRank
2       Hash    100 2012-01-01  1            1
2       Hash    100 2012-01-01  1            1
2       Hash    100 2012-01-01  1            1
1       Privy   50  2012-05-01  1            1
1       Privy   50  2012-05-01  1            1
1       Privy   50  2012-05-01  1            1
2       Robo    100 2012-01-01  1            1
2       Robo    100 2012-01-01  1            1
2       Robo    100 2012-01-01  1            1

很多

最终解决方案似乎更容易:

Select [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ], count(name) From EMP group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] having Count(Name) >1

它生成此结果集

MGR_ID  DEPT_ID NAME    SAL  DOJ       Count_Of_ Duplicated_Rows
NULL     2      Hash    100 2012-01-01      3
1        2      Robo    100 2012-01-01      3
2        1      Privy   50  2012-05-01      3

注意:仅当您按重复列进行分组时,此选项才有效。

下面的示例基于之前更复杂的查询,但它验证了行中的所有字段,与上面检查您正在对查询进行分组的特定列的条件的简单查询进行比较。

WITH CTE 
    AS
    (
SELECT A.[MGR_ID], A.[DEPT_ID], A.[NAME], A.[SAL], A.[DOJ] 
FROM EMP A
JOIN   (SELECT [MGR_ID], [DEPT_ID], [NAME], [SAL], [DOJ]
        FROM EMP 
        GROUP BY [MGR_ID], [DEPT_ID], [NAME], [SAL], [DOJ] 
        HAVING count(*) > 1) B

       ON  a.[MGR_ID] = b.[MGR_ID]
       AND a.[DEPT_ID] = b.[DEPT_ID]
       AND a.[NAME] = b.[NAME]
       AND a.[SAL] = b.[SAL]
       AND a.[DOJ] = b.[DOJ]
   )

   SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ], 
   count(*) As Count_Of_Duplicated_Rows 
   FROM EMP 
   GROUP BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] 
   --HAVING Count(*) >1

2 个答案:

答案 0 :(得分:0)

您的问题是您没有明确命名CTE中的选定列。由于EMP和子查询都有一个名为MGR_ID的列,因此在联接上执行select *会返回列MGR_ID两次。根据{{​​3}},这是不允许的:

  

只有在查询定义中提供了所有结果列的不同名称时,列名列表才是可选的。

请注意,对于连接两侧存在的每对列,您将遇到相同的错误。要解决此问题,您可以在列列表中明确命名CTE返回的列,并使用重复列的别名,如下所示:

WITH CTE (mgr_id,dept_id,name,sal,doj,mgr_id2,...) //mgr_id2 is an alias for b.mgr_id
AS
...

您可以参考此MSDN进行演示。删除列列表,您将看到现在看到的相同错误。

或者,您可以指定要在CTE中选择的列,我建议您这样做,因为您实际上不需要查询中的任何重复列:

;with cte as
(
SELECT A.[MGR_ID],A.[DEPT_ID],A.[NAME],A.[SAL],A.[DOJ] from EMP A
  join ( SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
           from EMP 
          group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] 
         having count(*) > 1 ) B
...

答案 1 :(得分:0)

试试这个

WITH CTE
    AS 
    (
    SELECT a.* from EMP A
      join ( SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
               from EMP 
              group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] 
             having count(*) > 1 ) B
       on  a.[MGR_ID] = b.[MGR_ID]
       --OR a.[MGR_ID] != b.[MGR_ID]
       AND a.[DEPT_ID] = b.[DEPT_ID]
       AND a.[NAME] = b.[NAME]
       AND a.[SAL] = b.[SAL]
       AND a.[DOJ] = b.[DOJ]
       ),cte2 as(

       SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ], DENSE_RANK() OVER
       (PARTITION BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] ORDER BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]) AS [DUPLICATES] 
       FROM CTE )
       select [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] from cte2 where DUPLICATES=1