为什么我对连接表的统计不正确?

时间:2018-04-17 12:40:48

标签: sql tsql count left-join sql-server-2014

我有以下表格:

STUDYA (别名J)

LOGINID   RCD_NUM         DATE         TYPE
 8745       0         04/15/2018      PRELIM
 6548       0         08/19/2017      PRELIM
 7445       0         10/02/2017      PRELIM
 2867       0         03/19/2018      PRELIM

试用(别名G)

LOGINID   RCD_NUM         DATE         TYPE
 8745       0          02/15/2017     ROLLUP
 7445       0          07/09/2016     ROLLUP
 2867       0          05/17/2017     ROLLUP
 2867       0          05/28/2017     ROLLUP
 5249       0          06/20/2017     ROLLUP
 1335       0          09/29/2017     ROLLUP
 9238       0          12/03/2017     ROLLUP

SPRINT (Alias H)

 LOGINID   RCD_NUM         DATE        TYPE
  5521       0          01/10/2018     SPRNT
  8745       1          04/04/2018     SPRNT
  3487       0          11/14/2017     SPRNT
  6627       0          05/05/2018     SPRNT

另一个名为 LOGIN 的表,我正在与LOGINID匹配。

我在SQL Server 2014上运行以下查询:

SELECT 'COUNTS', COUNT(G.LOGINID), COUNT(H.LOGINID), COUNT(J.LOGINID)
 FROM LOGIN F 
LEFT OUTER JOIN  TRIALS G ON  F.LOGINID= G.LOGINID AND G.RCD_NUM = 
 F.RCD_NUM 
LEFT OUTER JOIN  SPRINT H ON  F.LOGINID= H.LOGINID AND H.RCD_NUM = 
 F.RCD_NUM
LEFT OUTER JOIN  STUDYA J ON  F.LOGINID= J.LOGINID AND J.RCD_NUM = 
 F.RCD_NUM  

WHERE ( ( F.EFFDT = 
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED 
WHERE F.LOGINID = F_ED.LOGINID 
  AND F.RCD_NUM = F_ED.RCD_NUM 
  AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10)) 
AND F.EFFSEQ = 
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES 
 WHERE F.LOGINID = F_ES.LOGINID 
  AND F.RCD_NUM = F_ES.RCD_NUM 
  AND F.EFFDT = F_ES.EFFDT) )

以下是上述查询的结果:

(No column name)   (No column name)   (No column name)   (No column name) 
  COUNTS                  9                  5                  5

你可以看到最后一栏(COUNT(J.LOGINID))正在显示 5 记录,当在实际表格中(上面的查询)时实际上只有 4 记录。我认为这是因为LOGINID 2867在TRIALS表中有2行,在STUDYA中有1行。

我已经研究了这个,推荐似乎是在加入之前进行聚合。尽管如此,我还是努力做到最好。任何帮助表示赞赏!

2 个答案:

答案 0 :(得分:2)

根据您的数据和用例,您可以从您引用的方法中获得性能优势:聚合然后加入......

SELECT
 'COUNTS',
 COUNT(G.LOGINID) distinct_g, SUM(G.ROW_COUNT) row_count_g,
 COUNT(H.LOGINID) distinct_h, SUM(H.ROW_COUNT) row_count_h,
 COUNT(J.LOGINID) distinct_j, SUM(J.ROW_COUNT) row_count_j
FROM
 LOGIN F 
LEFT OUTER JOIN
(
 SELECT LOGINID, RCD_NUM, COUNT(*) AS ROW_COUNT FROM TRIALS GROUP BY LOGINID, RCD_NUM
)
 G
  ON  F.LOGINID = G.LOGINID
  AND F.RCD_NUM = G.RCD_NUM 
LEFT OUTER JOIN
(
 SELECT LOGINID, RCD_NUM, COUNT(*) AS ROW_COUNT FROM SPRINT GROUP BY LOGINID, RCD_NUM
)
 H
  ON  F.LOGINID = H.LOGINID
  AND F.RCD_NUM = H.RCD_NUM 
LEFT OUTER JOIN
(
 SELECT LOGINID, RCD_NUM, COUNT(*) AS ROW_COUNT FROM STUDYA GROUP BY LOGINID, RCD_NUM
)
 J
  ON  F.LOGINID = J.LOGINID
  AND F.RCD_NUM = J.RCD_NUM 

WHERE ( ( F.EFFDT = 
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED 
WHERE F.LOGINID = F_ED.LOGINID 
  AND F.RCD_NUM = F_ED.RCD_NUM 
  AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10)) 
AND F.EFFSEQ = 
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES 
 WHERE F.LOGINID = F_ES.LOGINID 
  AND F.RCD_NUM = F_ES.RCD_NUM 
  AND F.EFFDT = F_ES.EFFDT) )

优化器可以使用谓词 - 下推/宏类扩展来确保仅对相关行进行聚合并保留索引的使用。

此模式还允许您获取其他统计信息,例如每个登录/ rcd组合的最小/最大日期。

<强> 编辑:

另一种模式可能包括......

SELECT
 'COUNTS',
 SUM(G.ROW_COUNT) row_count_g,
 SUM(H.ROW_COUNT) row_count_h,
 SUM(J.ROW_COUNT) row_count_j
FROM
 LOGIN F 
OUTER APPLY
(
 SELECT COUNT(*) AS ROW_COUNT FROM TRIALS WHERE LOGINID = F.LOGINID AND RCD_NUM = F.RCD_NUM
)
  G
OUTER APPLY
(
 SELECT COUNT(*) AS ROW_COUNT FROM SPRINT WHERE LOGINID = F.LOGINID AND RCD_NUM = F.RCD_NUM
)
  H
OUTER APPLY
(
 SELECT COUNT(*) AS ROW_COUNT FROM STUDYA WHERE LOGINID = F.LOGINID AND RCD_NUM = F.RCD_NUM
)
  J

WHERE ( ( F.EFFDT = 
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED 
WHERE F.LOGINID = F_ED.LOGINID 
  AND F.RCD_NUM = F_ED.RCD_NUM 
  AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10)) 
AND F.EFFSEQ = 
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES 
 WHERE F.LOGINID = F_ES.LOGINID 
  AND F.RCD_NUM = F_ES.RCD_NUM 
  AND F.EFFDT = F_ES.EFFDT) )

答案 1 :(得分:1)

您需要获得不同的计数

 SELECT 'COUNTS', COUNT(distinct G.LOGINID), COUNT(distinct 
 H.LOGINID), COUNT(distinct J.LOGINID) ..