救我写作查询

时间:2018-05-17 10:39:49

标签: sql amazon-redshift

我是IT行业的新手,因此,在编写查询时(在Redshift中)获得一些帮助会很愉快。

我试图找出1)2月份有多少用户加入了我们的服务,2)用户在加入之后重新访问了哪些用户,如果他们在那时做了什么? (即第1天,第2天,第3天,......等),以及3)每位用户在访问当天获得了多少阅读/印象。

以下是我提供的表格和相关栏目。

The tables and relevant columns I have been provided are as following
   I) users:             id      created_at 
                        4,578,001  2018-05-16 00:02
                        4,578,002  2018-05-16 00:02
                        4,578,007  2018-05-16 00:10

   II) user_content_action_by_traffic_source:  id         created_at    action_type 
                                            4,428,830  2018-05-16 00:00   read 
                                            3,154,331  2018-05-16 00:00   read
                                              714,795  2018-05-16 00:00 impression 

where the column 'created_at in I)' shows the date of account created, 
and the column 'created_at in II)' shows the date of action recorded (i.e. a user reads a post)

  - Example codes with the above tables:
     SELECT u.created_at,
            u.id,
            SUM(CASE WHEN s.action_type = 'read' THEN 1 ELSE 0 END) AS read,
            SUM(CASE WHEN s.action_type = 'impression' THEN 1 ELSE 0 END) AS impression
     FROM users u INNER JOIN user_content_action_by_traffic_source s ON u.id = s.user_id
     WHERE u.created_at >= CURRENT_DATE - INTERVAL '2 days'
     GROUP BY 1, 2
     ORDER BY 1
     LIMIT 10
  - Example output:    created_at        id       read      impression  
                   2018-05-16 00:00   4,577,999    2           38
                   2018-05-16 00:01   4,578,000    1           77
                   2018-05-16 00:02   4,578,001    2           48

以下代码是我最初编写的(我知道我的不是很好)

WITH t1 (SELECT convert_timezone('Asia/Seoul', u.created_at) AS created_at,
                u.id AS new_user,
                COUNT(DISTINCT date_trunc('day', convert_timezone('Asia/Seoul', action.created_at))) AS num_of_days_visited
         FROM users u JOIN user_content_action_by_traffic_source action ON u.id = action.user_id
         WHERE DATE(u.created_at) >= '2018-02-01' 
               AND DATE(u.created_at) <= '2018-02-28'  
           AND action.created_at >= u.created_at
              AND action.created_at <= u.created_at + INTERVAL '4 weeks'
         GROUP BY 1, 2
         ORDER BY 1, 2)

SELECT DATE(t1.created_at),
       t1.new_user,
       DATE(convert_timezone('Asia/Seoul', action.created_at)) AS date_visited,
       SUM(CASE WHEN action.action_type = 'read' THEN 1 ELSE 0 END) AS Read,
       SUM(CASE WHEN action.action_type = 'impression' THEN 1 ELSE 0 END) AS Imp
FROM t1 JOIN user_content_action_by_traffic_source action ON t1.new_user = action.user_id
WHERE convert_timezone('Asia/Seoul', action.created_at) >= t1.created_at
      AND convert_timezone('Asia/Seoul', action.created_at) <= t1.created_at + INTERVAL '4 weeks'
  AND action.content_type = 'post'
  AND t1.num_of_days_visited = 2
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3

请注意,我所在的城市是首尔/韩国,而日期记录的时区则位于美国。

以下结果是

        date     new_user   date_visited  read  imp
   1 2018-02-01   4432986    2018-02-02    2     8 
   2 2018-02-01   4432987    2018-02-02    5     49
   3 2018-02-01   4432987    2018-02-26    1     0
   4 2018-02-01   4432992    2018-02-02    6    169

我理想的结果将是

        date     new_user   date_visited  read  imp  date_visited_2   read2  imp2
   1 2018-02-01   4432986    2018-02-02    2     8 
   2 2018-02-01   4432987    2018-02-02    5     49    2018-02-26       1     0 
   3 2018-02-01   4432992    2018-02-02    6    169

如果我对代码或样本数据的描述缺乏信息,请留下一些评论。

由于

0 个答案:

没有答案