因此,我有一个包含新闻文章的网站,并且试图计算每月4种用户类型。用户类型为:
1。新用户:在当月注册(其第一篇文章视图)并在当月中查看文章的用户。
2。保留用户:前一个月的新用户,或上个月和当月查看过文章的用户。
3。流失用户:前一个月的新用户或保留用户,但在当月未浏览过文章;前一个月的流失用户。
4。复活用户:上个月浏览过该月文章的用户。
**User Table A - Unique User Article Views**
- Current month = 2019-04-01 00:00:00 UTC
| user_id | viewed_at |
------------------------------------------
| 4 | 2019-04-01 00:00:00 UTC |
| 3 | 2019-04-01 00:00:00 UTC |
| 2 | 2019-04-01 00:00:00 UTC |
| 1 | 2019-03-01 00:00:00 UTC |
| 3 | 2019-03-01 00:00:00 UTC |
| 2 | 2019-02-01 00:00:00 UTC |
| 1 | 2019-02-01 00:00:00 UTC |
| 1 | 2019-01-01 00:00:00 UTC |
The table above outlines the following user types:
2019-01-01
* User 1: New
2019-02-01
* User 1: Retained
* User 2: New
2019-03-01
* User 1: Retained
* User 2: Churned
* User 3: New
2019-04-01
* User 1: Churned
* User 2: Resurrected
* User 3: Retained
* User 4: New
我想要的表计算每个月每种用户类型的不同user_id。
| month_viewed_at | ut_new | ut_retained | ut_churned | ut_resurrected
------------------------------------------------------------------------------------
| 2019-04-01 00:00:00 UTC | 1 | 1 | 1 | 1
| 2019-03-01 00:00:00 UTC | 1 | 1 | 1 | 0
| 2019-02-01 00:00:00 UTC | 1 | 1 | 0 | 0
| 2019-01-01 00:00:00 UTC | 1 | 0 | 0 | 0
答案 0 :(得分:1)
我只是不确定从哪里开始
希望您阅读了我所有的评论,并亲自尝试了一些操作,但是由于我看不到任何更新,我想您仍然停留在这里-所以我们开始...
以下是用于BigQuery标准SQL的信息,应为您提供指导
#standardSQL
WITH temp1 AS (
SELECT user_id,
FORMAT_DATE('%Y-%m', DATE(viewed_at)) month_viewed_at,
DATE_DIFF(DATE(viewed_at), '2000-01-01', MONTH) pos,
DATE_DIFF(DATE(MIN(viewed_at) OVER(PARTITION BY user_id)), '2000-01-01', MONTH) first_pos
FROM `project.dataset.table`
), temp2 AS (
SELECT *, pos = first_pos AS new_user
FROM temp1
), temp3 AS (
SELECT *, LAST_VALUE(new_user) OVER(win) OR pos - 1 = LAST_VALUE(pos) OVER(win) AS retained_user
FROM temp2
WINDOW win AS (PARTITION BY user_id ORDER BY pos RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING)
)
SELECT month_viewed_at,
COUNTIF(new_user) AS new_users,
COUNTIF(retained_user) AS retained_users
FROM temp3
GROUP BY month_viewed_at
-- ORDER BY month_viewed_at DESC
如果要应用于您的样本数据-结果为
Row month_viewed_at new_users retained_users
1 2019-04 1 1
2 2019-03 1 1
3 2019-02 1 1
4 2019-01 1 0
在temp1
中,我们通过将viewed_at格式化为所需格式以准备在输出广告中展示来准备数据,并且由于某些抽象数据(2000-02-02),我们正在将其转换为连续的月份数,因此我们可以使用分析使用RANGE而不是ROWS
在temp2
中,我们仅识别新用户,而在temp3
中-保留的用户
我认为,这可能是一个好的开始,所以我把剩下的留给您