左外连接表上的计数不正确

时间:2018-04-16 12:50:48

标签: sql sql-server-2014

我有以下查询,我从左外连接表中获取USRID的计数。 PS_HS_AUD表中的计数关闭1个记录,PS_HS_PRE表中的计数关闭1个(总计数减2个)。

我认为计数已关闭,因为在PS_HS_AUD表和另一个名为PS_HS_ANN的表中都存在USRID, AND USRID在表PS_HS_ANN中有2行(每行都有唯一的检查日期)。我有以下查询,我添加了获得MAX EXAM_DT的条件,希望它能得到正确的总数,但是我得到的结果与之前在WHERE子句中添加MAX考试日期标准相同(不正确)。 / p>

当前SQL:

SELECT 'ZTOTAL', '',  COUNT(G.USRID), COUNT(H.USRID), COUNT( J.USRID), 
 COUNT(M.USRID), COUNT(P.USRID), COUNT(S.USRID), COUNT(V.USRID), 
 COUNT(Y.USRID) 
FROM PS_JOB 
LEFT OUTER JOIN  PS_HS_ANN G ON  F.USRID = G.USRID AND G.EMPL_RCD = 
 F.EMPL_RCD 
LEFT OUTER JOIN  PS_HS_ANT H ON  F.USRID = H.USRID AND H.EMPL_RCD = 
 F.EMPL_RCD
LEFT OUTER JOIN  PS_HS_AUD J ON  F.USRID = J.USRID AND J.EMPL_RCD = 
 F.EMPL_RCD  
LEFT OUTER JOIN  PS_HS_DOT M ON  F.USRID = M.USRID AND M.EMPL_RCD = 
 F.EMPL_RCD  
LEFT OUTER JOIN  PS_HS_HAZ P ON  F.USRID = P.USRID AND P.EMPL_RCD = 
 F.EMPL_RCD  
LEFT OUTER JOIN  PS_HS_PRE S ON  F.USRID = S.USRID AND S.EMPL_RCD = 
 F.EMPL_RCD  
LEFT OUTER JOIN  PS_HS_RES V ON  F.USRID = V.USRID AND V.EMPL_RCD = 
 F.EMPL_RCD  
LEFT OUTER JOIN  PS_HS_ASB Y ON  F.USRID = Y.USRID AND Y.EMPL_RCD = 
 F.EMPL_RCD

WHERE ( ( F.EFFDT = 
    (SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED 
    WHERE F.USRID = F_ED.USRID 
      AND F.EMPL_RCD = F_ED.EMPL_RCD 
      AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10)) 
 AND F.EFFSEQ = 
    (SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES 
    WHERE F.USRID = F_ES.USRID 
      AND F.EMPL_RCD = F_ES.EMPL_RCD 
      AND F.EFFDT = F_ES.EFFDT) )
 AND (G.EXAM_DT = (SELECT MAX(GG.EXAM_DT) FROM PS_HS_ANN GG
                  WHERE GG.USRID = G.USRID
                   AND GG.EMPL_RCD = G.EMPL_RCD
                   AND GG.EXAM_DT = G.EXAM_DT)  
     OR H.EXAM_DT = (SELECT MAX(HH.EXAM_DT) FROM PS_HS_ANT HH
                     WHERE HH.USRID = H.USRID
                      AND HH.EMPL_RCD = H.EMPL_RCD
                      AND HH.EXAM_DT = H.EXAM_DT) 
     OR J.EXAM_DT = (SELECT MAX(JJ.EXAM_DT) FROM PS_HS_AUD JJ
                     WHERE JJ.USRID = J.USRID
                      AND JJ.EMPL_RCD = J.EMPL_RCD
                      AND JJ.EXAM_DT = J.EXAM_DT)  
     OR M.EXAM_DT = (SELECT MAX(MM.EXAM_DT) FROM PS_GHS_HS_DOT MM
                     WHERE MM.USRID = M.USRID
                      AND MM.EMPL_RCD = M.EMPL_RCD
                      AND MM.EXAM_DT = M.EXAM_DT)
     OR P.EXAM_DT = (SELECT MAX(PP.EXAM_DT) FROM PS_GHS_HS_HAZMAT PP
                     WHERE PP.USRID = P.USRID
                      AND PP.EMPL_RCD = P.EMPL_RCD
                      AND PP.EXAM_DT = P.EXAM_DT)
     OR S.EXAM_DT = (SELECT MAX(SS.EXAM_DT) FROM PS_HS_PRE SS
                     WHERE SS.USRID = S.USRID
                      AND SS.EMPL_RCD = S.EMPL_RCD
                      AND SS.EXAM_DT = S.EXAM_DT)
     OR V.EXAM_DT = (SELECT MAX(VV.EXAM_DT) FROM PS_GH_RESP_FIT VV
                     WHERE VV.USRID = V.USRID
                      AND VV.EMPL_RCD = V.EMPL_RCD
                      AND VV.EXAM_DT = V.EXAM_DT)
     OR Y.EXAM_DT = (SELECT MAX(YY.EXAM_DT) FROM PS_HS_ASB YY
                     WHERE YY.USRID = Y.USRID
                      AND YY.EMPL_RCD = Y.EMPL_RCD
                      AND YY.EXAM_DT = Y.EXAM_DT) ))

查询结果 enter image description here

上面的第5列(J.USRID)显示了5条记录的计数,尽管从PS_HS_AUD J表中的以下查询可以看出,只有4条记录。 (见下表):

PS_HS_AUD: enter image description here

如果我查询PS_HS_ANN表,您可以看到 USRID SD3925 (在PS_HS_AUD中也有记录)表中有2行。我相信这是导致在PS_HS_AUD中计算额外行的原因(就好像我用PS_HS_ANN注释掉连接,然后我的计数显示正确为4条记录)。

PS_HS_ANN:

enter image description here

同样的问题也出现在PS_HS_PRE表中(由于同样的USRID而重复)我还能用什么来防止这种情况发生?肯定会出现USRID在每个表中的多行中存在的情况。谢谢!

4/16/18更新:有没有人有任何其他想法我怎么能让这个工作?

1 个答案:

答案 0 :(得分:0)

解决问题的正确方法是在 join之前进行聚合。鉴于查询的复杂性,这可能很难实现。

快速而肮脏的方法是使用count(distinct)

SELECT 'ZTOTAL', '', 
        COUNT(DISTINCT G.USRID), COUNT(DISTINCT H.USRID), COUNT(DISTINCT J.USRID), 
        COUNT(DISTINCT M.USRID), COUNT(DISTINCT P.USRID), COUNT(DISTINCT S.USRID), COUNT(V.USRID), 
        COUNT(DISTINCT Y.USRID) 
. . . 

这是不太理想的,因为每个COUNT(DISTINCT)都会产生开销 - 并且沿着每个表预先聚合会消除这种情况。另外,中间结果可能会变得非常大。也影响了业绩。