从多列中查找重复项

时间:2013-11-11 16:29:10

标签: sql duplicates

我有一个棘手的SQL问题。让我给你一个例子

ID1   Name     Name2   Name3   Name4

100   Albert   Kevin   Jon     Alex
101   Albert   Jon     Kevin   Alex
102   Albert   Georg   Alex    Babera
103   Albert   Stefany

让我们说ID1给了我一个项目ID,而Name是主要人物(Albert)。 Name2-4是与Albert合作的人的子组。现在我想计算这个子组之间的匹配。首先,我想知道确切的匹配。例如在100和101之间。 第二是可以计算出多少名称匹配?就像101和100之间的一场比赛。

提前致谢

1 个答案:

答案 0 :(得分:0)

我知道它很长而且不是防弹,但它确实有效。

WITH source_t AS
(
        SELECT 100 id, 'Albert' name, 'Kevin'  name2, 'Jon' name3, 'Alex' name4 FROM DUAL UNION ALL
        SELECT 101,    'Albert',      'Jon',          'Kevin',     'Alex'       FROM DUAL UNION ALL
        SELECT 102,    'Albert',      'Georg',        'Alex',      'Babera'     FROM DUAL UNION ALL
        SELECT 103,    'Albert',      'Stefany',      NULL,         NULL        FROM DUAL
)
, tab_1 AS
(
        SELECT id, name, name2 FROM source_t UNION ALL
        SELECT id, name, name3 FROM source_t UNION ALL
        SELECT id, name, name4 FROM source_t
)
, tab_2 AS
(
        SELECT  id
        ,       name
        ,       name2
        ,       ROW_NUMBER() OVER (PARTITION BY id, name ORDER BY name2) AS r_number
        FROM    tab_1
)
, tab_3 AS
(
        SELECT  id
        ,       name
        ,       MAX(CASE WHEN r_number = 1 THEN name2 END) AS name2
        ,       MAX(CASE WHEN r_number = 2 THEN name2 END) AS name3
        ,       MAX(CASE WHEN r_number = 3 THEN name2 END) AS name4
        FROM    tab_2
        GROUP   BY
                id
        ,       name
)
SELECT  tab_3.id
,       tab_3.name
,       tab_3.name2
,       tab_3.name3
,       tab_3.name4
,       tab_4.n_count
FROM    tab_3
LEFT    JOIN
(
        SELECT  name
        ,       name2
        ,       name3
        ,       name4
        ,       COUNT(1) AS n_count
        FROM    tab_3
        GROUP   BY
                name
        ,       name2
        ,       name3
        ,       name4
)  tab_4
ON      tab_3.name  = tab_4.name
and     NVL(tab_3.name2, 'NULL') = NVL(tab_4.name2, 'NULL')
and     NVL(tab_3.name3, 'NULL') = NVL(tab_4.name3, 'NULL')
and     NVL(tab_3.name4, 'NULL') = NVL(tab_4.name4, 'NULL')
;
/*
102 Albert  Alex    Babera  Georg   1
103 Albert  Stefany NULL    NULL    1
101 Albert  Alex    Jon     Kevin   2
100 Albert  Alex    Jon     Kevin   2
*/