基于两个表的重复条目

时间:2019-03-16 00:45:33

标签: sql sql-server duplicates

我有两个具有多列的表,我想找到表1中哪个名字在表2中具有相同的父亲。 我尝试了这个:

SELECT
    d0.DateOfBirth,
    d.Id,d.Reg,
    d.Name, 
    D0.Id, 
    D0.FatherId,
    d1.Reg as Father_reg, 
    D1.Name as Fathers_Name, 
    D0.MotherId,
    d2.Reg as Mother_Reg, 
    D2.Name as Mothers_Name, 
FROM 
    dbo.Dogs d 
    LEFT JOIN dbo.Litters D0 ON D0.Id = d.LitterId
    LEFT JOIN dbo.Dogs D1 on D0.FatherId=D1.ID
    LEFT JOIN dbo.Dogs D2 on D0.MotherId=D2.ID
WHERE 
    d.Name IN (
        SELECT d.Name 
        FROM dbo.Dogs D 
        LEFT JOIN dbo.Litters D0 ON D0.Id = d.LitterId 
        GROUP BY d.Name  
        HAVING COUNT(*) > 1
    )
ORDER BY
    d.Name, 
    d0.DateOfBirth

这给了我所有重复的名字,但是我想要所有具有相同父亲的重复名字。

因此,即使表中的“弗兰克”是四次,但父亲中只有2次被命名为“伊恩”,也应该只列出这两个条目。我遇到的问题是名称在表dbo.dogs中,后代ID和父亲ID之间的连接在表dbo.litters中,因此当我尝试进行选择时,我在子查询中进行计数,并且只允许一个选择。英语不是我的母语,所以我希望这有道理;)

我摆弄小玩意儿来查看数据 here

我想看的是这个

DateOfBirth Id  Reg Name    Id  FatherId    Father_reg  Fathers_Name    MotherId    Mother_Reg  Mothers_Name
-------------------------------------------------------------------------------------------------------------
01/04/2012 00:00:00 3   NO34567/2012    Fido    9000    2   NO12345/2010    king    1   NO23456/2009    Queen    
01/04/2012 00:00:00 6   NO34567/2012    Fido    9000    2   NO12345/2010    king    1   NO23456/2009    Queen

小提琴中父亲姓名相同的第一行,过滤掉父亲姓名仅一次的位置。

小提琴中的解决方案:here

2 个答案:

答案 0 :(得分:1)

这是您的示例数据:

SELECT * FROM dogs d LEFT JOIN litters l ON d.LitterId = l.id
ID | Reg          | Name   | LitterID |   ID | Dateofbirth         | FatherID | motherID
:- | :----------- | :----- | :------- | ---: | :------------------ | -------: | -------:
3  | NO34567/2012 | Fido   | 9000     | 9000 | 01/04/2012 00:00:00 |        2 |        1
4  | NO34568/2012 | Fido   | 6000     | 6000 | 01/06/2014 00:00:00 |        9 |        8
5  | NO34569/2012 | Fido   | 5000     | 5000 | 01/05/2013 00:00:00 |        7 |        8
6  | NO34567/2012 | Fido   | 9000     | 9000 | 01/04/2012 00:00:00 |        2 |        1
2  | NO12345/2010 | king   | 8000     | null | null                |     null |     null
1  | NO23456/2009 | Queen  | 7000     | null | null                |     null |     null
7  | NO12346/2010 | God    | 8000     | null | null                |     null |     null
8  | NO23457/2009 | Godess | 7000     | null | null                |     null |     null
9  | NO12346/2010 | Devil  | 8000     | null | null                |     null |     null

我了解您正在寻找具有相同父亲名字的狗。在SQL Server中,一个简单的解决方案是使用窗口函数COUNT(...) OVER(...)来为每个记录计算存在多少这样的重复项。

考虑:

SELECT * FROM (
    SELECT 
        d.ID, 
        d.Reg, 
        d.Name, 
        d.LitterID, 
        l.Dateofbirth, 
        l.FatherID, 
        l.MotherID, 
        COUNT(*) OVER(PARTITION BY d.Name, l.FatherId) cnt
    FROM dogs d 
    LEFT JOIN litters l ON d.LitterId = l.ID
) x WHERE cnt > 1

收益:

ID | Reg          | Name | LitterID | Dateofbirth         | FatherID | motherID | cnt
:- | :----------- | :--- | :------- | :------------------ | -------: | -------: | --:
3  | NO34567/2012 | Fido | 9000     | 01/04/2012 00:00:00 |        2 |        1 |   2
6  | NO34567/2012 | Fido | 9000     | 01/04/2012 00:00:00 |        2 |        1 |   2

现在剩下要做的就是增加一些自我联接以获取父母的名字:

SELECT
    x.DateOfBirth,
    x.ID,
    x.Reg,
    x.Name,
    x.FatherID,
    d_father.Reg FatherReg,
    d_father.Name FatherName,
    x.MotherID,
    d_mother.Reg MotherReg,
    d_mother.Name MotherName
FROM 
    (
        SELECT 
            d.ID, 
            d.Reg, 
            d.Name, 
            d.LitterID, 
            l.Dateofbirth, 
            l.FatherID, 
            l.MotherID, 
            COUNT(*) OVER(PARTITION BY d.Name, l.FatherId) cnt
        FROM dogs d 
        LEFT JOIN litters l ON d.LitterId = l.ID
    ) x 
    INNER JOIN dogs d_mother ON d_mother.ID = x.MotherID
    INNER JOIN dogs d_father ON d_father.ID = x.FatherID
WHERE x.cnt > 1

结果:

DateOfBirth         | ID | Reg          | Name | FatherID | FatherReg    | FatherName | MotherID | MotherReg    | MotherName
:------------------ | :- | :----------- | :--- | -------: | :----------- | :--------- | -------: | :----------- | :---------
01/04/2012 00:00:00 | 3  | NO34567/2012 | Fido |        2 | NO12345/2010 | king       |        1 | NO23456/2009 | Queen     
01/04/2012 00:00:00 | 6  | NO34567/2012 | Fido |        2 | NO12345/2010 | king       |        1 | NO23456/2009 | Queen     

Demo on DB Fiddle

答案 1 :(得分:0)

是否可以在in子句中使用垃圾ID代替名称,因为Fido被重复了多次,并且看起来您的联接基于垃圾ID。如果这样做,您将获得预期的输出。

Select  d0.DateOfBirth,d.Id,d.Reg,d.Name, D0.Id , D0.FatherId,d1.Reg as Father_reg, D1.Name as Fathers_Name, D0.MotherId,d2.Reg as Mother_Reg, D2.Name as Mothers_Name
from dbo.Dogs d 
     join dbo.Litters D0 on D0.Id = d.LitterId
     join dbo.Dogs D1 on D0.FatherId=D1.ID
     join dbo.Dogs D2 on D0.MotherId=D2.ID
where d.LitterId in (select d.LitterId from dbo.Dogs D left join dbo.Litters D0 on D0.Id = d.LitterId Group by d.LitterId  having COUNT(*) > 1)
order by d.Name, d0.DateOfBirth

输出:

DateOfBirth        Id    Reg           Name     Id   FatherId Father_reg Fathers_Name   MotherId    Mother_Reg  Mothers_Name
01/04/2012 00:00:00 3   NO34567/2012    Fido    9000    2   NO12345/2010    king    1   NO23456/2009    Queen
01/04/2012 00:00:00 6   NO34567/2012    Fido    9000    2   NO12345/2010    king    1   NO23456/2009    Queen