如何识别具有重复条目的列(使用SQL)?

时间:2012-08-14 21:55:46

标签: sql multiple-columns

我们正在构建一个数据库,用于检查员工在公司各个系统上使用的当前用户名的任何重复项。以前,某些员工与某些系统共享相同的用户名访问权限。由于方向是为每个系统上的每个员工提供唯一的用户名,因此我们需要确定哪些员工仍在使用共享访问权限。该数据库有一个表,其中包含员工的姓名及其各自的用户名。

Ex:表1

Employee    System1 System2  System3
John Doe    dJohn   Pkls453  xfd801
Jane Doe    dJane   Pkls454  xfd801
James Lee   dJames  Pkls455  fd674
Mark Jones  dMark   Pkls453  xfd752

我们需要生成的是一份报告,指出John Doe和Jane Doe在System3上共享相同的访问权限,并且John Doe和Mark Jones正在共享System2的访问权限。类似的东西:

Employee  System3  System2
John Doe  xfd801
Jane Doe  xfd801
John Doe           Pkls453
Mark Jones         Pkls453

有没有办法解决这个问题?

提前致谢...

2 个答案:

答案 0 :(得分:2)

我确信这是一个更清洁的解决方案,但这应该以您指定的格式返回您正在寻找的内容。

SELECT Employee, System1, NULL AS System2, NULL AS System3
FROM your_table T1
WHERE EXISTS(SELECT * FROM your_table T2
         WHERE T1.System1 = T2.System1
         AND T1.Employee <> T2.Employee)
UNION
SELECT Employee, NULL AS System1, System2, NULL AS System3
FROM your_table T1
WHERE EXISTS(SELECT * FROM your_table T2
         WHERE T1.System2 = T2.System2
         AND T1.Employee <> T2.Employee)
UNION
SELECT Employee, NULL AS System1, NULL AS System2, System3
FROM your_table T1
WHERE EXISTS(SELECT * FROM your_table T2
         WHERE T1.System3 = T2.System3
         AND T1.Employee <> T2.Employee)
ORDER BY System1, System2, System3

答案 1 :(得分:2)

如果您的系统支持窗口功能,则可以使用:

SELECT employee, system1, system2, system3
FROM  (
   SELECT employee
         ,system1
         ,cast(NULL AS text) AS system2
         ,cast(NULL AS text) AS system3
         ,count(*) OVER (PARTITION BY system1) AS ct
   FROM tbl1

   UNION  ALL
   SELECT employee
         ,NULL -- cast and column name only needed in first SELECT in Postgres
         ,system2
         ,NULL
         ,count(*) OVER (PARTITION BY system2) AS ct
   FROM   tbl1

   UNION  ALL
   SELECT employee
         ,NULL
         ,NULL
         ,system3
         ,count(*) OVER (PARTITION BY system3) AS ct
   FROM   tbl1
   ) x
WHERE  ct > 1
ORDER  BY system1, system2, system3;

或者,可能更快:
请注意,共享多个系统的“John Doe”仅在以下查询中(而不是第一个)与其所有共享系统一起列出一次。非共享系统设置为NULL

SELECT employee
      ,CASE WHEN ct1 > 1 THEN system1 ELSE NULL END AS system1
      ,CASE WHEN ct2 > 1 THEN system2 ELSE NULL END AS system2
      ,CASE WHEN ct3 > 1 THEN system3 ELSE NULL END AS system3
FROM   (
    SELECT employee, system1, system2, system3
          ,count(*) OVER (PARTITION BY system1) AS ct1
          ,count(*) OVER (PARTITION BY system2) AS ct2
          ,count(*) OVER (PARTITION BY system3) AS ct3
    FROM tbl1
    ) x
WHERE  ct1 > 1 OR ct2 > 1 OR ct3 > 1
ORDER  BY system1, system2, system3; -- depends on what you want

或者,如果您的匿名系统支持公用表表达式:

WITH x AS (
    SELECT employee, system1, system2, system3
          ,count(*) OVER (PARTITION BY system1) AS ct1
          ,count(*) OVER (PARTITION BY system2) AS ct2
          ,count(*) OVER (PARTITION BY system3) AS ct3
    FROM tbl1
    )
SELECT employee
      ,CASE WHEN ct1 > 1 THEN system1 ELSE NULL END AS system1
      ,CASE WHEN ct2 > 1 THEN system2 ELSE NULL END AS system2
      ,CASE WHEN ct3 > 1 THEN system3 ELSE NULL END AS system3
FROM   x
WHERE  ct1 > 1 OR ct2 > 1 OR ct3 > 1
ORDER  BY system1, system2, system3; -- depends

如果您既没有CTE也没有窗口功能:
(应该适用于所有主要的RDBMS,包括MySQL。)

SELECT t.employee, s1.system1, s2.system2, s3.system3
FROM   tbl1 t
LEFT   JOIN (SELECT system1 FROM tbl1 GROUP BY 1 HAVING count(*) > 1) s1
                                                 ON t.system1 = s1.system1
LEFT   JOIN (SELECT system2 FROM tbl1 GROUP BY 1 HAVING count(*) > 1) s2
                                                 ON t.system2 = s2.system2
LEFT   JOIN (SELECT system3 FROM tbl1 GROUP BY 1 HAVING count(*) > 1) s3
                                                 ON t.system3 = s3.system3
WHERE s1.system1 IS NOT NULL
   OR s2.system2 IS NOT NULL
   OR s3.system3 IS NOT NULL
ORDER BY s1.system1, s2.system2, s3.system3; -- depends

使用PostgreSQL 9.1.4进行测试。