我写了一个重复率查询,它给出了以下格式的队列重复率数据:
cohort_join_day | repeat_day | repeat_users
11/15/16 | 0 | 10000
11/15/16 | 1 | 6000
11/15/16 | 2 | 3000
repeat_day 0表示当天的总群组大小
我试图跳过excel步骤并添加第四列,每日重复率百分比如下:
cohort_join_day | repeat_day | repeat_users | repeat_percentage
11/15/16 | 0 | 10000 | 100%
11/15/16 | 1 | 6000 | 60%
11/15/16 | 2 | 3000 | 30%
这一行的计算应该非常简单,例如: 第1天的第1天队列重复率=(第1天的第1天队列重复率)/(第1天的第1天队列重复率)
(第1天的第1天队列重复率)代表队列的总大小
实现这一目标的最佳方法是什么?
这是我写的每日重复率查询:
SELECT
to_char(cohort_join_day, 'YYYY-MM-DD') AS cohort_join_day,
EXTRACT(DAY FROM (current_day - cohort_join_day)) AS repeat_day,
COUNT(DISTINCT unique_id) AS repeat_users
FROM
(
SELECT
auu.unique_id,
date_trunc('day', auu.ds) AS current_day,
date_trunc('day', fsb.ds) AS cohort_join_day
FROM rust.a_unique_users AS auu
JOIN mobile.first_seen_byos AS fsb
ON fsb.unique_id = auu.unique_id
WHERE
auu.os_type = 'iphone_native_app'
AND fsb.ds >= '2016-11-01'
) AS uniques_by_day
WHERE
cohort_join_day <= current_day
GROUP BY
cohort_join_day,
repeat_day;
答案 0 :(得分:2)
<强> SQL DEMO 强>
WITH boo AS (
SELECT *
FROM foo -- here go your query
), base as (
SELECT "repeat_users"
FROM boo
WHERE "repeat_day" = 0
)
SELECT boo.cohort_join_day,
boo.repeat_day,
boo.repeat_users,
100* ((boo.repeat_users * 1.0) / base.repeat_users) as repeat_percentage
FROM boo
CROSS JOIN base
<强>输出强>
答案 1 :(得分:1)
SELECT
*
,(repeat_users * 100.0) /
MAX(CASE WHEN repeat_day = 0 THEN repeat_users END) OVER () as repeat_percentage
FROM
Table
条件聚合和窗口函数使这更容易
如果你每天都试图进行这种calucation,那么通过cohor_join_day对窗口功能进行PARTITION:
SELECT
*
,(repeat_users * 100.0) /
MAX(CASE WHEN repeat_day = 0 THEN repeat_users END) OVER (PARTITION BY cohort_join_day) as repeat_percentage
FROM
Table
MAX(column) OVER ()
只会在MAX
中提供整个数据集中的column
值。
MAX(column) OVER (PARTITION BY column2)
将在该列中提供匹配MAX
值的column2
值。您可以将PARTITION BY
视为与GROUP BY
类似。
用{em>案例表达式替换column
允许您执行条件聚合。因此,例如,当您只希望repeat_users when repeat_day = 0
一个案例表达式表示它将意味着它将仅为每个分区返回1个值并忽略其他值,因为它们将为null。
因此,如果您想在没有窗口函数的直接查询中执行相同的操作,您可以执行以下操作:
SELECT
t.*
,(t.repeat_users * 100.0) / (SELECT t2.repeat_users
FROM
Table t2
WHERE
t.cohort_join_day = t2.cohort_join_day
AND t2.repeat_day = 0) as repeat_percentage
FROM
Table t
如果您有多天参与,可以通过胡安卡罗的方法向您展示如何做到这一点,您可以这样做:
WITH cte AS (
SELECT
cohort_join_day
,repeat_users
FROM
@Table
WHERE
repeat_day = 0
)
SELECT
t.*
,(t.repeat_users * 100.0) / c.repeat_users as repeat_percentage
FROM
Table t
CROSS JOIN cte c
WHERE
t.cohort_join_day = c.cohort_join_day
如果你想要一个跑步总计尝试类似
SUM(column) OVER (PARTITION BY column2 ORDER BY column3)
绝对熟悉窗口功能这些天他们是救命的。