Question

我需要编写查询以生成频率表。目前，我正在研究Amazon redshift数据库。我已经生成了一个表格，如下所示：

user_id   user_label     code1  code2  code_3  date    
------   -----------    -----  -----  ------  -------
1        x              a      b      c       01-01
1        x              a      d      c       01-01
1        x              a      b      c       01-02
1        y              a      c      d       01-01
2        x              a      b      d       01-01

等

计算出现次数的规则是，如果两行具有相同的ID和日期，则重复的代码应仅计算一次。
例如，对于前两行，频率表应为：

user_id      user_label   a   b   c   d 
--------     -----------  --  --  --  -- 
1            x            1   1   1   1

因为即使a和c都有两个实例，但它们发生在同一日期，所以只应计数一次，因此我需要对user_id + user_label的每个唯一组合进行此操作

然后在处理第三行之后，频率表应为：

user_id      user_label   a   b   c   d 
--------     -----------  --  --  --  -- 
1            x            2   2   2   1

由于第三行的日期不同，因此a，b，c的计数应增加1

最后，对于上面给出的样本表，期望的结果应该是

user_id      user_label   a   b   c   d 
--------     -----------  --  --  --  -- 
1            x            2   2   2   1
1            y            1   1   1   0
2            x            1   1   0   1

我知道我应该把到目前为止已经尝试过的东西放进去，但是实际上我不知道从哪里开始。这不是家庭作业问题，任何提示或建议将不胜感激。

Answer 1

您似乎想要有条件的count(distinct)：

select user_id, user_label,
       count(distinct case when 'a' in (code1, code2, code3) then date end) as a,
       count(distinct case when 'b' in (code1, code2, code3) then date end) as b,
       count(distinct case when 'c' in (code1, code2, code3) then date end) as c,
       count(distinct case when 'd' in (code1, code2, code3) then date end) as d
from t
group by user_id, user_label

根据用户定义的标准对多个列进行计数

1 个答案: