如何识别同一行的字段/列中具有重复值的行

时间:2018-05-14 12:25:17

标签: sql sql-server oracle11g

在表格中,我想找到至少有2个字段(列)具有重复“非空”值的行。通用SQL解决方案将更受欢迎,因为它可以在任何数据库中使用。如果不是这样,Oracle和SQL Server就是我的目标数据库。作为一个例子

ID    COL1    COL2    COL3    COL4
1      11     11       11      44
2      11     22       33      44
3      11     null     33      33
4      11     null     null    44

应返回以下行

ID    COL1    COL2    COL3    COL4
1      11     11       11      44
3      11     null     33      33

第一行有3个字段,重复值为11,其他行有col3,col4的重复值为33

4 个答案:

答案 0 :(得分:2)

蛮力方法是:

select t.*
from t
where (col1 = col2 or col1 = col3 or col1 = col4 or
       col2 = col3 or col2 = col4 or col3 = col4
      ) or
      (col1 is null and (col2 is null or col3 is null or col4 is null) or
       col2 is null and (col3 is null or col4 is null) or
       col3 is null and col4 is null
      )

这适用于任何数据库。

答案 1 :(得分:0)

您可以使用UNPIVOT

在Oracle中执行此操作

SQL Fiddle

Oracle 11g R2架构设置

CREATE TABLE table_name ( ID, COL1, COL2, COL3, COL4 ) As
SELECT 1,      11,     11,       11,      44 FROM DUAL UNION ALL
SELECT 2,      11,     22,       33,      44 FROM DUAL UNION ALL
SELECT 3,      11,     null,     33,      33 FROM DUAL UNION ALL
SELECT 4,      11,     null,     null,    44 FROM DUAL;

查询1

SELECT *
FROM   table_name
WHERE  id IN (
  SELECT id
  FROM   table_name
  UNPIVOT ( value FOR key IN ( COL1, COL2, COL3, COL4 ) )
  GROUP BY id, value
  HAVING COUNT( DISTINCT key ) > 1
)

<强> Results

| ID | COL1 |   COL2 | COL3 | COL4 |
|----|------|--------|------|------|
|  1 |   11 |     11 |   11 |   44 |
|  3 |   11 | (null) |   33 |   33 |

如果您想在NULL上匹配,请使用UNPIVOT INCLUDE NULLS

和SQL Server一样,代码几乎相同(只需UNPIVOT上的别名):

SQL Fiddle

查询1

SELECT *
FROM   table_name
WHERE  id IN (
  SELECT id
  FROM   table_name
  UNPIVOT ( value FOR name IN ( COL1, COL2, COL3, COL4 ) ) AS u
  GROUP BY id, value
  HAVING COUNT( DISTINCT name ) > 1
)

<强> Results

| ID | COL1 |   COL2 | COL3 | COL4 |
|----|------|--------|------|------|
|  1 |   11 |     11 |   11 |   44 |
|  3 |   11 | (null) |   33 |   33 |

<强>更新

您还可以使用Oracle中的*_TAB_COLUMN字典表生成强力查询(SQL服务器中可能存在等效项):

SELECT 'SELECT * FROM TABLE_NAME WHERE ('
       || LISTAGG(
            '"' || PRIOR COLUMN_NAME || '" = "' || COLUMN_NAME || '"',
            ' OR '
          ) WITHIN GROUP ( ORDER BY ROWNUM )
          || ')' AS query
FROM   USER_TAB_COLUMNS
WHERE  TABLE_NAME = 'TABLE_NAME'
AND    COLUMN_NAME LIKE 'COL%'
AND    LEVEL = 2
START WITH COLUMN_NAME LIKE 'COL%'
CONNECT BY PRIOR COLUMN_ID < COLUMN_ID;

哪个输出:

SELECT * FROM TABLE_NAME WHERE ("COL1" = "COL2" OR "COL1" = "COL3" OR "COL1" = "COL4" OR "COL2" = "COL3" OR "COL2" = "COL4" OR "COL3" = "COL4")

答案 2 :(得分:0)

没有硬编码列名称的解决方案(Sql Server)。 可以说,我们的表是[#test]。然后我们的查询是:

;with [temp] as (
    select
         [id]           =   id
        ,[col_name1]    =   [c1].[value]('local-name(.)',   'nvarchar(256)')
        ,[col_value1]   =   [c1].[value]('.',               'nvarchar(256)')
        ,[col_name2]    =   [c2].[value]('local-name(.)',   'nvarchar(256)')
        ,[col_value2]   =   [c2].[value]('.',               'nvarchar(256)')    
    from 
        [#test] as  [t]
    cross apply
        (
            select [data] = convert(xml, (select [t].* for xml path('row') ))
        )       as  [x]
    cross apply
        [x].[data].[nodes]('row/*') as [t1]([c1])
    cross apply
        [x].[data].[nodes]('row/*') as [t2]([c2])
)
,[ids] as (
    select 
        [id]
    from 
        [temp]
    where
            ([col_name1]    <>  [col_name2] )
        and ([col_value1]   =   [col_value2])
    group by
         [id]
)
select 
    *
from
    [#test] as  [t]
inner join
    [ids]   as  [i]
on
        [t].[id] = [i].[id];

可以找到完整查询:https://pastebin.com/jUG5r41c

答案 3 :(得分:0)

对于SQL Server,我会使用APPLY运算符来执行此操作:

select * 
from (select *, (select COUNT(*) from (values (Col1), (Col2),.. (ColN))t(ids)) TotalCols,
                (select COUNT(distinct ids) from (values (Col1), (Col2),.. (ColN))t(ids)) DistinctCols
      from table t
    ) t
where TotalIds <> DistinctCols;