PARTITION BY Name,Id来比较和检测问题

时间:2015-07-28 06:35:49

标签: sql sql-server sql-server-2008 tsql group-by

想象一下,这里有3家公司。我们按Name加入表格,因为并非每位员工都提供了PersonalNoStringId只有专家,所以它也不能用于加入。同一名员工可以在多家公司工作。

问题

问题是可能有不同的员工使用相同的名称(名字和名字相同,例如只提供名字)。

我需要什么?

当数据出现任何问题时返回10如果是正确的话。

检测问题的规则

  1. 当有多个相同的名字( 2或更多)并且所有人都有PersonalNo并且并非所有人都有StringId作为Peter )时返回1错误
  2. 当有多个相同的名称( 2或更多)并且有NULL参见John )时,但它们都具有相同的{{1它应该返回StringId这是正确的,这意味着其中一家公司未提供0
  3. 当有多个相同的名称( 2或更多)且所有PersonalNo相等且所有PersonalNo都相等时(参见Lisa )它应该返回StringId正确
  4. 当有多个相同的名称( 2或更多)并且提供了多个不同的0和所有PersonalNo它应该是这样的:我们看到这里有两个不同的Jennifer与StringId 4805250141和Jennifer与PersonalNo 4920225088的人Jennifer与PersonalNo NULL的Jennifer与Jennifer一样PersonalNo StringId 4920225088所以它应该返回PersonalNo正确),并且不应该选择0 4805250141的Jennifer,因为PersonalNo并且只有1行具有相同的StringID
  5. 如果只有一行且没有提供PersonalNo它根本不应出现在选择中。
  6. 示例数据

    StringId

    渴望输出

    Company     Name        PersonalNo   StringId 
    Comp1       Peter       3850342515    85426 -------------------------------------------------------------------
    Comp2       Peter       3850342515    ''    -- If have the same PersonalNo and there is no StringId - 1 (wrong)
    Comp1       John        NULL          12345 ------------------------------------------------------------------
    Comp2       John        3952525252    12345 -- If have the same StringId and 1 PersonalNo is NULL - 0 (correct)
    Comp1       Lisa        4951212581    52124 ----------------------------------------------------------------
    Comp3       Lisa        4951212581    52124 -- If PersonalNo are equal and StringId are equal - 0 (correct)
    Comp1       Jennifer    4805250141    ''    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Comp1       Jennifer    4920225088    55443 -- If have 2 different PersonalNo and NULL PersonalNo, but where PersonalNo is NULL 
    Comp3       Jennifer    NULL          55443 -- Have the same StringId with other row where is provided PersonalNo it should be 0 (correct), with different PersonalNo where is no StringId shouldn't appear at all.
    Comp1       Ralph       3961212256    ''    -- Shouldn't appear in select list, because only 1 row with this PersonalNo and there is no StringID
    

    QUERY

    Peter     1
    John      0
    Lisa      0
    Jennifer  0
    

    查询问题是我只能按LEFT JOIN (SELECT Name, ( SELECT CASE WHEN MIN(PersonalNo) <> MAX(d.PersonalNo) and MIN(CASE WHEN StringId IS NULL THEN '0' ELSE StringId END) <> MAX(CASE WHEN d.StringId IS NULL THEN '0' ELSE d.StringId END) -- this is wrong and MIN(PersonalNo) <> '' and MIN(PersonalNo) IS NOT NULL and MAX(rn) > 1 THEN 1 ELSE 0 END AS CheckPersonalNo FROM ( SELECT Name, PersonalNo, [StringId], ROW_NUMBER() OVER (PARTITION BY Name, PersonalNo ORDER BY Name) rn FROM TableEmp e1 WHERE Condition = 1 and e1.Name = d.Name ) sub2 GROUP BY Name ) CheckPersonalNo FROM [TableEmp] d WHERE Condition = 1 GROUP BY Name ) f ON f.Name = x.Name 分组,无法将Name添加到PersonalNo子句,因此我需要在选择列表中使用聚合。但是现在它只比较GROUP BYMIN值,如果有超过2行具有相同的名称它没有按预期工作。

    我需要做一些事情,比较MAX的值。现在,它将值与同一PARTITION BY Fullname, PersonalNo进行比较(不依赖于Name)。

    有什么想法吗?如果您有任何问题 - 请问我,我会尽力解释。

    更新1

    如果有2个条目的PersonalNo不同,但PersonalNo相等,则应为StringId(错误)。

    1

    现在它返回如下:

    Company     Name    PersonalNo   StringId 
    Comp1       Anna    4805250141    88552    -- different PersonalNo and the same StringId for both should go as 1 (wrong)
    Comp1       Anna    4920225088    88552 
    

    应该是:

    Anna    0
    Anna    0
    

    更新2

    Anna 1 列中UNION更新后,Identifier返回{对于下面的数据),但在这种情况下,当1个条目有StringId: 55443时,其他为PersonalNo },但它们都有相同的(等于)blank它是正确的(应该是0)

    StringId

1 个答案:

答案 0 :(得分:2)

我希望我理解你的要求..

可能有其他方法可以做到这一点,但我个人可能会使用临时表进行临时工作,如果是我这样做的话。

--select data into a temp table that can be modified
select
    *
    into #cleaned
from 
    table


--apply personal numbers based on other records with matching string id
--you could take note of the records you are doing this to for data clean up
update c
    set c.personalNo = s.personalNo
from #cleaned as c
    inner join table as s
        on c.name = s.name
        and c.stringID = s.stringID
        and c.personalNo is null
        and s.personalNo is not null

--find all records with non matching string ids
select 
    name
    ,PersonalNo
    ,count(*) as numIDs
    into #issues
from(
    select
        name
        ,PersonalNo
        ,stringID
    from 
        #cleaned
    group by
        name
        ,PersonalNo
        ,stringID
    ) as i
group by
    name
    ,PersonalNo
having 
    count(*) > 1

--select data for viewing.
select
    distinct
    s.name
    ,case
        when i.name is not null then 1
        else 0
    end as issue
from
    #cleaned as s
    left outer join #issues as i
        on s.name = i.name
        and s.personalNo = i.personalNo
order by issue desc

SQLFiddle:http://sqlfiddle.com/#!3/f4aab/7

抱歉,如果这里有虫子,但我相信你会得到这个想法,它不是火箭科学,只是另一种方法

编辑:刚刚注意到您对没有字符串ID的行感兴趣..只是如果它是唯一的行,那么它不是问题。我修改了第一个select(into #cleaned)以获取所有行。

编辑没有临时表现在您知道它在做什么,这里没有任何临时表是一样的 - 但是警告这个更新分配缺少的personalNo的源表

update c
    set c.personalNo = s.personalNo
from table1 as c
    inner join table1 as s
        on c.name = s.name
        and c.stringID = s.stringID
        and c.personalNo is null
        and s.personalNo is not null


select
    distinct
    s.name
    ,case
        when i.name is not null then 1
        else 0
    end as issue
from
    table1 as s
    left outer join (
                select 
                    name
                    ,PersonalNo
                    ,count(*) as numIDs
                from(
                    select
                        name
                        ,PersonalNo
                        ,stringID
                    from 
                        table1
                    group by
                        name
                        ,PersonalNo
                        ,stringID
                    ) as i
                group by
                    name
                    ,PersonalNo
                having 
                    count(*) > 1
        )
        as i
        on s.name = i.name
        and s.personalNo = i.personalNo
order by issue desc

SQLFiddle:http://sqlfiddle.com/#!3/f4aab/8

PARITIONING 我不知道我将如何在这里使用分区,因为你想要做的只是知道是否有多行,我使用更复杂的制表​​分区或者如果我要去根据更复杂的规则对判断调用更新数据的结果进行排名..但无论如何这里是一个被禁止分区的乌鸦:D

Select
    name
    ,personalNo
    ,case
        when numstrings > 1 then 1
        else 0 end as issue
from
    (select
        name
        ,personalNo
        ,row_number() over (partition by 
                                    name
                                    ,personalNo 
                                order by 
                                    name
                                    ,personalNo
                                    ,stringID
                                    ) as numstrings
    from
        #cleaned
    group by
        name
        ,personalNo
        ,stringid) as d
order by
    issue desc

注意:这使用了如上所述的#cleaned表,因为我认为使这种情况变得困难的关键在于有时候缺少个人名称。

没有临时表,没有更新

在上面使用它显然可以不使用任何临时表或更新任何东西,它只是一个可读性/可维护性的问题,以及它是否实际上更快。这可以更稳定地处理具有多个personalNo分配的字符串ID:

select
    distinct
    s.name
    ,case
        when i.name is not null then 1
        else 0
    end as issue
from
    table1 as s
    left outer join (
                select 
                    name
                    ,PersonalNo
                    ,count(*) as numIDs
                from(
                    select
                        a.name
                        ,coalesce(a.PersonalNo,b.PersonalNo) as PersonalNo
                        ,a.stringID
                    from 
                        table1 as a
                            left outer join table1 as b
                                on a.name = b.name
                                and a.stringid=b.stringid
                                and a.personalNo != b.personalNo
                                and b.personalNo Is Not Null
                    group by
                        a.name
                        ,a.PersonalNo
                        ,a.stringID
                        ,b.PersonalNo
                    ) as i
                group by
                    name
                    ,PersonalNo
                having 
                    count(*) > 1
        )
        as i
        on s.name = i.name
        and s.personalNo = i.personalNo
order by issue desc

SQLFiddle:http://sqlfiddle.com/#!3/f4aab/9

编辑:寻找不一致的个人数字 - 这会使用一个临时表,但您可以像上一个示例中所做的那样将其交换出来。注意与您提出的原始结构略有不同因为这更像是我将如何完成这项任务,但是这里有足够的代码可以让你以任何你想要的方式重新开始。

--select data into a temp table that can be modified
select
    *
    into #cleaned
from 
    table1


--apply personal numbers based on other records with matching string id
--you could take note of the records you are doing this to for data clean up
update c
    set c.personalNo = s.personalNo
from #cleaned as c
    inner join table1 as s
        on c.name = s.name
        and c.stringID = s.stringID
        and c.personalNo is null
        and s.personalNo is not null


Select
    IssueType
     ,Name
     ,Identifier
from 
    (
        --find all records with non matching PersonalNos
        select 
            name
            ,cast('StringID: ' + stringID as nvarchar(400)) as Identifier
            ,cast('Inconsistent  PersonalNo' as nvarchar(400)) as issueType
        from(
            select
                name
                ,PersonalNo
                ,stringID
            from 
                #cleaned
            group by
                name
                ,PersonalNo
                ,stringID
            ) as i
        group by
            name
            ,StringId
        having 
            count(*) > 1

    UNION    
        --find all records with non matching string ids

        select 
            name
            ,'PersonalNo: ' + PersonalNo
            ,cast('Inconsistent String ID' as nvarchar(400)) as issueType
        from(
            select
                name
                ,PersonalNo
                ,stringID
            from 
                #cleaned
            group by
                name
                ,PersonalNo
                ,stringID
            ) as i
        group by
            name
            ,PersonalNo
        having 
            count(*) > 1
    ) as a

SQLFiddle:http://sqlfiddle.com/#!3/e9da2/18

更新:还想接受空字符串personalNo的 这是另一个新要求..接受空字符串的方式与personalNo

中的NULL相同
--select data into a temp table that can be modified
select
    *
    into #cleaned
from 
    table1

--apply personal numbers based on other records with matching string id
--you could take note of the records you are doing this to for data clean up
update c
    set c.personalNo = s.personalNo
from #cleaned as c
    inner join table1 as s
        on c.name = s.name
        and c.stringID = s.stringID
        and  (c.personalNo IS NULL OR c.personalNo ='')
        and s.personalNo is not null
        and s.personalNo != ''


Select
     IssueType
     ,Name
     ,Identifier
from 
    (
        --find all records with non matching PersonalNos
        select 
            name
            ,cast('StringID: ' + stringID as nvarchar(400)) as Identifier
            ,cast('Inconsistent  PersonalNo' as nvarchar(400)) as issueType
        from(
            select
                name
                ,PersonalNo
                ,stringID
            from 
                #cleaned
            group by
                name
                ,PersonalNo
                ,stringID
            ) as i
        group by
            name
            ,StringId
        having 
            count(*) > 1

  UNION    
        --find all records with non matching string ids
        select 
            name
            ,'PersonalNo: ' + PersonalNo
            ,cast('Inconsistent String ID' as nvarchar(400)) as issueType
        from(
            select
                name
                ,PersonalNo
                ,stringID
            from 
                #cleaned
            group by
                name
                ,PersonalNo
                ,stringID
            ) as i
        group by
            name
            ,PersonalNo
        having 
            count(*) > 1
    ) as a

SQLFiddle:http://sqlfiddle.com/#!3/412127/8