仅在NOT NULL时比较列

时间:2019-03-26 18:19:02

标签: sql-server

这应该很容易,但是我只是缺少一些东西。我有以下内容:

IF OBJECT_ID('LAST_NM') IS NOT NULL
    DROP TABLE LAST_NM

CREATE TABLE LAST_NM (
    ID int NOT NULL IDENTITY(1,1),
    LAST_NM_ORIGINAL varchar(255) NOT NULL,
    LAST_NM_1 varchar(255)NULL,
    LAST_NM_2 varchar(255)NULL,
    LAST_NM_3 varchar(255)NULL,
    LAST_NM_4 varchar(255)NULL,
    PRIMARY KEY (ID)
);

INSERT INTO LAST_NM
(LAST_NM_ORIGINAL, LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4)
VALUES
('SMITH', 'HARRIS', NULL, 'HARRIS', NULL),
('JONES', 'FUTURE', 'FUTURE', 'FUTURE', 'FUTURE'),
('SMITH', 'ALPHA', 'ALPHA', 'ALPHA', NULL),
('SMITH', 'BETA', 'BETA', 'GEORGE', NULL),
('SMITH', 'SMITH', NULL, 'SMITH', NULL),
('DOPE', NULL, NULL, NULL, 'CURLS')

我想做的是从这张表中的SELECT,其中:

  • last_nm_#IS NOT NULL
  • NOT NULL last_nm_#都具有相同的值
  • 这些last_nm_#与LAST_NM_ORIGINAL
  • 不同

我尝试过使用CASESWITCH的情况下,如果我用硬编码将语句中的内容编入代码,并像这样进行合并,则会得到一个混乱的版本:


SELECT * FROM (
    SELECT ID, LAST_NM_ORIGINAL, LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4
    FROM LAST_NM
    WHERE       LAST_NM_1 IS NOT NULL
            AND LAST_NM_2 IS NOT NULL
            AND LAST_NM_3 IS NOT NULL
            AND LAST_NM_4 IS NOT NULL
            AND LAST_NM_1 = LAST_NM_2
            AND LAST_NM_1 = LAST_NM_3
            AND LAST_NM_3 = LAST_NM_4
            AND LAST_NM_1 <> LAST_NM_ORIGINAL
    UNION
    SELECT ID, LAST_NM_ORIGINAL, LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4
    FROM LAST_NM
    WHERE       LAST_NM_1 IS NOT NULL
            AND LAST_NM_2 IS NOT NULL
            AND LAST_NM_3 IS NOT NULL
            AND LAST_NM_4 IS NULL
            AND LAST_NM_1 = LAST_NM_2
            AND LAST_NM_1 = LAST_NM_3
            AND LAST_NM_1 <> LAST_NM_ORIGINAL
    /*
    WRITE OUT EACH POSSIBLE WAY AND UNION ALL OF THEM
    .
    .
    .
    */
    UNION
    SELECT ID, LAST_NM_ORIGINAL, LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4
    FROM LAST_NM
    WHERE       LAST_NM_1 IS NULL
            AND LAST_NM_2 IS NULL
            AND LAST_NM_3 IS NULL
            AND LAST_NM_4 IS NOT NULL
            AND LAST_NM_4 <> LAST_NM_ORIGINAL
    ) AS RESULT_SET

总而言之,如果LAST_NM_#不为NULL并且与所有其他NOT NULL LAST_NM_#相同且与LAST_NM_ORIGINAL不同,我想选择行。

因此,在我的示例中,我应该返回第1、2、3和6行,但不要返回第4行(“新”名称不同)或第5行(新名称与旧名称相同)一个)。

必须有一个更好的方法,而不是只写出所有内容并对其进行UNIONING。对吧?

6 个答案:

答案 0 :(得分:2)

这里正在使用UNPIVOT

;WITH NAMES
AS (
    SELECT DISTINCT ID
        ,LAST_NM_ORIGINAL
        ,LAST_NM_NEW
    FROM (
        SELECT ID
            ,LAST_NM_ORIGINAL
            ,LAST_NM_1
            ,LAST_NM_2
            ,LAST_NM_3
            ,LAST_NM_4
        FROM LAST_NM
        ) AS X
    UNPIVOT(LAST_NM_NEW FOR LAST_NM_NEWS IN (
                LAST_NM_1
                ,LAST_NM_2
                ,LAST_NM_3
                ,LAST_NM_4
                )) AS Y
    )
SELECT ID
    ,LAST_NM_ORIGINAL
    ,LAST_NM_NEW
FROM NAMES
WHERE ID IN (
        SELECT ID
        FROM NAMES
        GROUP BY ID
        HAVING COUNT(ID) = 1
        )
    AND LAST_NM_ORIGINAL <> LAST_NM_NEW

答案 1 :(得分:1)

这是执行此操作的一种方法。

select ID
    , LAST_NM_ORIGINAL
    , LAST_NM_1
    , LAST_NM_2
    , LAST_NM_3
    , LAST_NM_4
from LAST_NM
where replace(isnull(LAST_NM_1, '') + isnull(LAST_NM_2, '') + isnull(LAST_NM_3, '') + isnull(LAST_NM_4, ''), LAST_NM_ORIGINAL, '') > ''
    AND replace(isnull(LAST_NM_1, '') + isnull(LAST_NM_2, '') + isnull(LAST_NM_3, '') + isnull(LAST_NM_4, ''), coalesce(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4), '') = ''

答案 2 :(得分:1)

这是另一种方式。

SELECT
    *
FROM
    LAST_NM
WHERE
    (LAST_NM_1 IS NULL OR LAST_NM_1 = COALESCE(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4)) AND
    (LAST_NM_2 IS NULL OR LAST_NM_2 = COALESCE(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4)) AND
    (LAST_NM_3 IS NULL OR LAST_NM_3 = COALESCE(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4)) AND
    (LAST_NM_4 IS NULL OR LAST_NM_4 = COALESCE(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4)) AND
    (LAST_NM_1 IS NULL OR LAST_NM_1 <> LAST_NM_ORIGINAL) AND
    (LAST_NM_2 IS NULL OR LAST_NM_2 <> LAST_NM_ORIGINAL) AND
    (LAST_NM_3 IS NULL OR LAST_NM_3 <> LAST_NM_ORIGINAL) AND
    (LAST_NM_4 IS NULL OR LAST_NM_4 <> LAST_NM_ORIGINAL)

编辑。可以缩短为以下查询,因为您希望LAST_NM_#中的至少一个不为空:

SELECT
    *
FROM
    LAST_NM
WHERE
    (LAST_NM_1 IS NULL OR LAST_NM_1 = COALESCE(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4)) AND
    (LAST_NM_2 IS NULL OR LAST_NM_2 = COALESCE(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4)) AND
    (LAST_NM_3 IS NULL OR LAST_NM_3 = COALESCE(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4)) AND
    (LAST_NM_4 IS NULL OR LAST_NM_4 = COALESCE(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4)) AND
    (LAST_NM_ORIGINAL <> COALESCE(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4))

答案 3 :(得分:1)

这个问题立即向我尖叫UNPIVOT

您可以查看链接,也可以只查看Google的语法和示例,然后使用它来获取派生表,如下所示:

ID NM_Orig   NM_Number  NM_Value
1  Smith     1          Harris
1  Smith     2          NULL
1  Smith     3          Harris
1  Smith     4          NULL
2  Jones     1          Future
etc...

从该派生表中,您将查询以获取ID WHERE NM_Value is NOT NULL AND NM_Value <> NM_Orig AND WHERE NOT EXISTS的相关行,该行具有与相关NM_Value不同的NON-NULL NM_Value

答案 4 :(得分:1)

您的原始查询确实有一些多余的谓词。例如,任何满足LAST_NM_1 = LAST_NM_2的行都必须同时具有LAST_NM_1 NOT NULLLAST_NM_2 NOT NULL

但是通过使用VALUES来获得表格形式的4列,可以使其非常简洁。

SELECT *
FROM   LAST_NM
WHERE  EXISTS (SELECT *
               FROM   (VALUES(LAST_NM_1),
                             (LAST_NM_2),
                             (LAST_NM_3),
                             (LAST_NM_4)) V(LAST_NM_N)
               HAVING MAX(LAST_NM_N) = MIN(LAST_NM_N) /*exactly one NOT NULL value among the 4 columns*/
                      AND MAX(LAST_NM_N) <> LAST_NM_ORIGINAL) 

答案 5 :(得分:1)

这是另一种方式

SELECT * FROM
#LAST_NM 
WHERE
COALESCE(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4) = COALESCE(LAST_NM_2, LAST_NM_3, LAST_NM_4,LAST_NM_1)
AND COALESCE(LAST_NM_2, LAST_NM_3, LAST_NM_4,LAST_NM_1) = COALESCE(LAST_NM_3, LAST_NM_4,LAST_NM_1,LAST_NM_2)
AND COALESCE(LAST_NM_3, LAST_NM_4,LAST_NM_1,LAST_NM_2) = COALESCE(LAST_NM_4,LAST_NM_1,LAST_NM_2,LAST_NM_3)
AND (LAST_NM_ORIGINAL <> COALESCE(LAST_NM_1, LAST_NM_2, LAST_NM_3, LAST_NM_4))