在完全外部联接中计算DISTINCT

时间:2019-11-06 22:37:49

标签: sql postgresql amazon-redshift

我敢肯定有一个简单的解决方案,我的豌豆脑现在无法理解。

我正在将以下查询与FULL OUTER JOIN配合使用,并且我想对DISTINCT memberid进行计数:

SELECT a.year,
       COUNT(DISTINCT a.memberid) AS members
FROM (SELECT DISTINCT YEAR,
             memberid
      FROM (SELECT EXTRACT(YEAR FROM created_at) AS YEAR,
                   EXTRACT(MONTH FROM created_at) AS MONTH,
                   member_id AS memberid,
                   COUNT(DISTINCT field1) AS field1
            FROM table1            
            GROUP BY YEAR,
                     MONTH,
                     member_id
            ORDER BY YEAR,
                     MONTH,
                     eids DESC)) a
  FULL OUTER JOIN (SELECT DISTINCT YEAR,
                          memberid
                   FROM (SELECT EXTRACT(YEAR FROM created) AS YEAR,
                                EXTRACT(MONTH FROM created) AS MONTH,
                                memberid,
                                COUNT(field2) AS field2
                         FROM table2                        
                         GROUP BY YEAR,
                                  MONTH,
                                  memberid
                         ORDER BY YEAR,
                                  MONTH,
                                  questions DESC)) b
               ON a.year = b.year
              AND a.memberid = b.memberid
GROUP BY a.year
ORDER BY a.year

此查询正确执行,但是我很确定结果不是我期望的。

我得到以下结果:

2014    26834
2015    58573
2016    178378
2017    233291
2018    297404
2019    281088

现在将FULL OUTER JOIN两侧的查询称为Left queryRight query。当我在Right query上汇总year并计算不同的memberid时,得到以下结果:

2013    3915
2014    59025
2015    115514
2016    176528
2017    216675
2018    301007
2019    311141

我们可以看到,DISTINCT COUNT本身的结果(Right query)高于具有FULL OUTER JOIN的完整查询。这显然是没有道理的。

在最终结果中,我想对所有COUNT DISTINCT(即出现在memberid中的memberid和{{1} }出现在Left query中,而没有两次计算任何memberid并通过Right query对其进行汇总。

我知道解决方案必须很简单。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:3)

您仅在计算a.memberid,这意味着右侧的任何内容都将被忽略。

要执行此操作,您应该在左侧和右侧之间进行并集,然后仅计数(不同的id)