SQL Duplicate Rows Multiple Joins

时间:2015-02-24 21:56:43

标签: sql duplicates left-join

对于SQL来说,我几乎是一个菜鸟,所以任何帮助都会受到赞赏。我有一个大型数据集,我正在为医院过滤。我从6个不同的表中提取数据,其中一个表每次访问都有重复的行。我只想为每次访问拉一行(拉入哪一行并不重要)。我知道我需要使用DISTINCT或GROUP BY子句,但我的语法必定是错误的。

    SELECT
         ADV.[VisitID]          AS  VisitID
        ,ADV.[Name]             AS  Name
        ,ADV.[UnitNumber]       AS  UnitNumber
        ,CONVERT(DATE,ADV.[BirthDateTime])                          AS  BirthDate
        ,ADV.[ReasonForVisit]   AS  ReasonForVisit
        ,ADV.[AccountNumber]    AS  AccountNumber
        ,DATEDIFF(day, ADV.ServiceDateTime, DIS.DischargeDateTime)  AS LOS
        ,ADV.[HomePhone]        AS  PhoneNumber
        ,ADV.[ServiceDateTime]  AS  ServiceDateTime
        ,ADV.[Status]           AS  'Status'
        ,PRV.[PrimaryCareID]    AS  PCP
        ,LAB.[TestMnemonic]     AS  Test
        ,LAB.[ResultRW]         AS  Result
        ,LAB.[AbnormalFlag]     AS  AbnormalFlag
        ,LAB.[ResultDateTime]   AS  ResultDateTime
        ,DIS.[Diagnosis]        AS  DischargeDiagnosis
        ,DIS.[ErDiagnosis]      AS  ERDiagnosis
        ,DCP.[TextLine]         AS  ProblemList


FROM          Visits                                      ADV
    LEFT JOIN Tests                                       LAB             ON ( LAB.VisitID  = ADV.VisitID AND
                                                                               LAB.SourceID = ADV.SourceID )
    LEFT JOIN Discharge                                   DIS             ON ( DIS.VisitID  = LAB.VisitID AND
                                                                               DIS.SourceID = LAB.SourceID )
    LEFT JOIN Providers                                   PRV             ON ( PRV.VisitID  = DIS.VisitID AND
                                                                               PRV.SourceID = DIS.SourceID )
    LEFT JOIN ProblemListVisits                           EPS             ON ( EPS.VisitID  = PRV.VisitID AND
                                                                               EPS.SourceID = PRV.SourceID )                                                                                                 
    LEFT JOIN ProblemList                                 DCP             ON ( DCP.PatientID = EPS.PatientID AND
                                                                               DCP.SourceID  = EPS.SourceID )


WHERE ( DCP.[TextLine]       LIKE '%Diabetes%'          OR 
        DCP.[TextLine]       LIKE '%Diabetic%'          OR
        DCP.[TextLine]       LIKE '%DM2%'               OR
        DCP.[TextLine]       LIKE '%DKA%'               OR
        DCP.[TextLine]       LIKE '%Hyperglycemia%'     OR
        DCP.[TextLine]       LIKE '%Hypoglycemia%'    ) AND
      ( LAB.[TestMnemonic] = 'GLU'                      OR
        LAB.[TestMnemonic] = '%HA1C'                  ) AND
        ADV.[Status]      != 'DIS CLI'                )

所以这可行,但是当医生进入患者的问题列表并进行更改时,它会重新整理整个列表,再次填充ProblemList表。因此,对于1次访问,由于ProblemList,我可能会获得4个重复的条目,而我只想要一个。它也不重要。

我尝试引用其他问题并嵌套另一个SELECT语句但我只是不断收到语法错误。

这是重复值的样子:

1111111111  SMITH,JOHN  1111    1/1/1901    CHEST PAIN  1111    2   111-111-1111    1/1/1901 12:15  DIS IN  DOEJO   GLU 120  H  1/2/1901 6:35   NULL    CHEST PAIN  Diabetes type 2, controlled
1111111111  SMITH,JOHN  1111    1/1/1901    CHEST PAIN  1111    2   111-111-1111    1/1/1901 12:15  DIS IN  DOEJO   GLU 120  H  1/2/1901 6:35   NULL    CHEST PAIN  Diabetes type 2, controlled
1111111111  SMITH,JOHN  1111    1/1/1901    CHEST PAIN  1111    2   111-111-1111    1/1/1901 12:15  DIS IN  DOEJO   GLU 120  H  1/2/1901 6:35   NULL    CHEST PAIN  Diabetes type 2, controlled
1111111111  SMITH,JOHN  1111    1/1/1901    CHEST PAIN  1111    2   111-111-1111    1/1/1901 12:15  DIS IN  DOEJO   GLU 120  H  1/2/1901 6:35   NULL    CHEST PAIN  Diabetes type 2, controlled

最后,糖尿病类型2,控制'是导致重复的原因。如果我从查询中删除ProblemListVisit和ProblemList表,我只得到一行数据。

最重要的是获得所有独特的测试结果,但不是问题列表中的所有重复条目(只是想知道他们患有什么类型的糖尿病,ONCE)。

谢谢!

3 个答案:

答案 0 :(得分:1)

Distinct子句应该可以解决问题 但如果没有,你可以改变

LEFT JOIN ProblemList   DCP             ON ( DCP.PatientID = EPS.PatientID AND
                                             DCP.SourceID  = EPS.SourceID )

OUTER APPLY (Select top 1 DCP.[TextLine]  FROM  ProblemList   DCP WHERE
                                             DCP.PatientID = EPS.PatientID  
                                           AND DCP.SourceID  = EPS.SourceID) DCP

答案 1 :(得分:1)

代替DISTINCT,我认为这是实现这一目标的最快捷方式,您也可以将生成多行的每个表移动到子查询中,其中GROUP BY您为JOINS寻找的值和选择。

这里有两个好处:

  1. 您可以更好地控制这些更精细的表格和

  2. 的输出
  3. 当您通过子查询中的WHERE子句限制它们允许的内容时,可以减少JOIN上的开销,从而减少I / O和CPU使用量。

  4. 代码:

    SELECT
             ADV.[VisitID]          AS  VisitID
            ,ADV.[Name]             AS  Name
            ,ADV.[UnitNumber]       AS  UnitNumber
            ,CONVERT(DATE,ADV.[BirthDateTime])                          AS  BirthDate
            ,ADV.[ReasonForVisit]   AS  ReasonForVisit
            ,ADV.[AccountNumber]    AS  AccountNumber
            ,DATEDIFF(day, ADV.ServiceDateTime, DIS.DischargeDateTime)  AS LOS
            ,ADV.[HomePhone]        AS  PhoneNumber
            ,ADV.[ServiceDateTime]  AS  ServiceDateTime
            ,ADV.[Status]           AS  'Status'
            ,PRV.[PrimaryCareID]    AS  PCP
            ,LAB.[TestMnemonic]     AS  Test
            ,LAB.[ResultRW]         AS  Result
            ,LAB.[AbnormalFlag]     AS  AbnormalFlag
            ,LAB.[ResultDateTime]   AS  ResultDateTime
            ,DIS.[Diagnosis]        AS  DischargeDiagnosis
            ,DIS.[ErDiagnosis]      AS  ERDiagnosis
            ,DCP.[TextLine]         AS  ProblemList
    
    
    FROM          Visits                                      ADV
        LEFT JOIN Tests                                       LAB             ON ( LAB.VisitID  = ADV.VisitID AND
                                                                                   LAB.SourceID = ADV.SourceID )
        LEFT JOIN Discharge                                   DIS             ON ( DIS.VisitID  = LAB.VisitID AND
                                                                                   DIS.SourceID = LAB.SourceID )
        LEFT JOIN Providers                                   PRV             ON ( PRV.VisitID  = DIS.VisitID AND
                                                                                   PRV.SourceID = DIS.SourceID )
        LEFT JOIN 
            (
                SELECT 
                    VisitID, 
                    SourceID, 
                    PatientID
                FROM ProblemListVisits 
                GROUP BY 
                    VisitID, 
                    SourceID, 
                    PatientID
            )                                                 EPS             ON ( EPS.VisitID  = PRV.VisitID AND
                                                                                   EPS.SourceID = PRV.SourceID )                                                                                                 
        LEFT JOIN 
            (
                SELECT 
                    PatientID, 
                    SourceID, 
                    TextLine 
                FROM ProblemList 
                WHERE 
                    [TextLine]       LIKE '%Diabetes%'          OR 
                    [TextLine]       LIKE '%Diabetic%'          OR
                    [TextLine]       LIKE '%DM2%'               OR
                    [TextLine]       LIKE '%DKA%'               OR
                    [TextLine]       LIKE '%Hyperglycemia%'     OR
                    [TextLine]       LIKE '%Hypoglycemia%' 
                GROUP BY 
                    PatientID, 
                    SourceID, 
                    TextLine 
            )                                                  DCP             ON ( DCP.PatientID = EPS.PatientID AND
                                                                                   DCP.SourceID  = EPS.SourceID )
    
    
    WHERE ( LAB.[TestMnemonic] = 'GLU'                      OR
            LAB.[TestMnemonic] = '%HA1C'                  ) AND
            ADV.[Status]      != 'DIS CLI'                )
    

    如果您仍然获得倍数,则表明[TextLine]在您的ProblemList表中的每个VisitID / PatientID组合中都有多个值。此时,您可以从GROUP BY子句中删除该一个,并在该子字段中使用某种类型的聚合,如MAX([TextLine])。但我怀疑,在使用DISTINCT或使用此子查询方法后,您将不会有重复项。

答案 2 :(得分:0)

尝试在DISTINCT之后添加SELECT。像这样:

SELECT DISTINCT
     ADV.[VisitID]          AS  VisitID
    ,ADV.[Name]             AS  Name 
    ...