我是IT行业的新手,因此,在编写查询时(在Redshift中)获得一些帮助会很愉快。
我试图找出1)2月份有多少用户加入了我们的服务,2)用户在加入之后重新访问了哪些用户,如果他们在那时做了什么? (即第1天,第2天,第3天,......等),以及3)每位用户在访问当天获得了多少阅读/印象。
以下是我提供的表格和相关栏目。
The tables and relevant columns I have been provided are as following
I) users: id created_at
4,578,001 2018-05-16 00:02
4,578,002 2018-05-16 00:02
4,578,007 2018-05-16 00:10
II) user_content_action_by_traffic_source: id created_at action_type
4,428,830 2018-05-16 00:00 read
3,154,331 2018-05-16 00:00 read
714,795 2018-05-16 00:00 impression
where the column 'created_at in I)' shows the date of account created,
and the column 'created_at in II)' shows the date of action recorded (i.e. a user reads a post)
- Example codes with the above tables:
SELECT u.created_at,
u.id,
SUM(CASE WHEN s.action_type = 'read' THEN 1 ELSE 0 END) AS read,
SUM(CASE WHEN s.action_type = 'impression' THEN 1 ELSE 0 END) AS impression
FROM users u INNER JOIN user_content_action_by_traffic_source s ON u.id = s.user_id
WHERE u.created_at >= CURRENT_DATE - INTERVAL '2 days'
GROUP BY 1, 2
ORDER BY 1
LIMIT 10
- Example output: created_at id read impression
2018-05-16 00:00 4,577,999 2 38
2018-05-16 00:01 4,578,000 1 77
2018-05-16 00:02 4,578,001 2 48
以下代码是我最初编写的(我知道我的不是很好)
WITH t1 (SELECT convert_timezone('Asia/Seoul', u.created_at) AS created_at,
u.id AS new_user,
COUNT(DISTINCT date_trunc('day', convert_timezone('Asia/Seoul', action.created_at))) AS num_of_days_visited
FROM users u JOIN user_content_action_by_traffic_source action ON u.id = action.user_id
WHERE DATE(u.created_at) >= '2018-02-01'
AND DATE(u.created_at) <= '2018-02-28'
AND action.created_at >= u.created_at
AND action.created_at <= u.created_at + INTERVAL '4 weeks'
GROUP BY 1, 2
ORDER BY 1, 2)
SELECT DATE(t1.created_at),
t1.new_user,
DATE(convert_timezone('Asia/Seoul', action.created_at)) AS date_visited,
SUM(CASE WHEN action.action_type = 'read' THEN 1 ELSE 0 END) AS Read,
SUM(CASE WHEN action.action_type = 'impression' THEN 1 ELSE 0 END) AS Imp
FROM t1 JOIN user_content_action_by_traffic_source action ON t1.new_user = action.user_id
WHERE convert_timezone('Asia/Seoul', action.created_at) >= t1.created_at
AND convert_timezone('Asia/Seoul', action.created_at) <= t1.created_at + INTERVAL '4 weeks'
AND action.content_type = 'post'
AND t1.num_of_days_visited = 2
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3
请注意,我所在的城市是首尔/韩国,而日期记录的时区则位于美国。
以下结果是
date new_user date_visited read imp
1 2018-02-01 4432986 2018-02-02 2 8
2 2018-02-01 4432987 2018-02-02 5 49
3 2018-02-01 4432987 2018-02-26 1 0
4 2018-02-01 4432992 2018-02-02 6 169
我理想的结果将是
date new_user date_visited read imp date_visited_2 read2 imp2
1 2018-02-01 4432986 2018-02-02 2 8
2 2018-02-01 4432987 2018-02-02 5 49 2018-02-26 1 0
3 2018-02-01 4432992 2018-02-02 6 169
如果我对代码或样本数据的描述缺乏信息,请留下一些评论。
由于