计算重复率队列的百分比

时间:2016-11-30 23:15:34

标签: sql postgresql

我写了一个重复率查询,它给出了以下格式的队列重复率数据:

cohort_join_day | repeat_day | repeat_users
11/15/16        |      0     | 10000
11/15/16        |      1     | 6000
11/15/16        |      2     | 3000

repeat_day 0表示当天的总群组大小

我试图跳过excel步骤并添加第四列,每日重复率百分比如下:

cohort_join_day | repeat_day | repeat_users | repeat_percentage
11/15/16        |      0     | 10000        | 100%
11/15/16        |      1     | 6000         |  60%
11/15/16        |      2     | 3000         |  30%

这一行的计算应该非常简单,例如: 第1天的第1天队列重复率=(第1天的第1天队列重复率)/(第1天的第1天队列重复率)

(第1天的第1天队列重复率)代表队列的总大小

实现这一目标的最佳方法是什么?

这是我写的每日重复率查询:

    SELECT
  to_char(cohort_join_day, 'YYYY-MM-DD')            AS cohort_join_day,
  EXTRACT(DAY FROM (current_day - cohort_join_day)) AS repeat_day,
  COUNT(DISTINCT unique_id)                         AS repeat_users
FROM
  (
    SELECT
      auu.unique_id,
      date_trunc('day', auu.ds) AS current_day,
      date_trunc('day', fsb.ds) AS cohort_join_day
    FROM rust.a_unique_users AS auu
      JOIN mobile.first_seen_byos AS fsb
        ON fsb.unique_id = auu.unique_id
    WHERE
      auu.os_type = 'iphone_native_app'
      AND fsb.ds >= '2016-11-01'
  ) AS uniques_by_day
WHERE
  cohort_join_day <= current_day
GROUP BY
  cohort_join_day,
  repeat_day;

2 个答案:

答案 0 :(得分:2)

<强> SQL DEMO

WITH boo AS (
    SELECT *
    FROM foo     -- here go your query
), base as (
    SELECT "repeat_users"
    FROM boo 
    WHERE "repeat_day" = 0
)

SELECT boo.cohort_join_day,
       boo.repeat_day,
       boo.repeat_users,
       100* ((boo.repeat_users *  1.0) / base.repeat_users) as repeat_percentage
FROM boo
CROSS JOIN base

<强>输出

enter image description here

答案 1 :(得分:1)

SELECT
    *
    ,(repeat_users * 100.0) /
       MAX(CASE WHEN repeat_day = 0 THEN repeat_users END) OVER () as repeat_percentage
FROM
     Table

条件聚合和窗口函数使这更容易

如果你每天都试图进行这种calucation,那么通过cohor_join_day对窗口功能进行PARTITION:

SELECT
    *
    ,(repeat_users * 100.0) /
       MAX(CASE WHEN repeat_day = 0 THEN repeat_users END) OVER (PARTITION BY cohort_join_day) as repeat_percentage
FROM
    Table

MAX(column) OVER ()只会在MAX中提供整个数据集中的column值。

MAX(column) OVER (PARTITION BY column2)将在该列中提供匹配MAX值的column2值。您可以将PARTITION BY视为与GROUP BY类似。

用{em>案例表达式替换column允许您执行条件聚合。因此,例如,当您只希望repeat_users when repeat_day = 0一个案例表达式表示它将意味着它将仅为每个分区返回1个值并忽略其他值,因为它们将为null。

因此,如果您想在没有窗口函数的直接查询中执行相同的操作,您可以执行以下操作:

SELECT
    t.*
    ,(t.repeat_users * 100.0) / (SELECT t2.repeat_users
             FROM
                Table t2
             WHERE
                t.cohort_join_day = t2.cohort_join_day
                AND t2.repeat_day = 0)     as repeat_percentage
FROM
    Table t

如果您有多天参与,可以通过胡安卡罗的方法向您展示如何做到这一点,您可以这样做:

WITH cte AS (
    SELECT
       cohort_join_day
       ,repeat_users
    FROM
       @Table
    WHERE
       repeat_day = 0
)

SELECT
    t.*
    ,(t.repeat_users * 100.0) / c.repeat_users as repeat_percentage
FROM
    Table t
    CROSS JOIN cte c
WHERE
    t.cohort_join_day = c.cohort_join_day

如果你想要一个跑步总计尝试类似

SUM(column) OVER (PARTITION BY column2 ORDER BY column3)

绝对熟悉窗口功能这些天他们是救命的。