将唯一的用户ID与上个月的首次用户ID结合在一起

时间:2019-07-10 20:19:49

标签: firebase google-bigquery

我想确定在第a个月(这里:1月)发生过“ first_open”事件并在第b个月(这里:2月)回到我们的用户中发生了“ user_engagement”事件的用户。

我的想法: 1.创建一个包含所有具有“ first_open”事件的用户的表 2.创建一个包含所有具有“ user_engagement”事件的用户的表 3.在userID上连接两个表 4.计算在a和b月份都发生过“ first_open”事件的用户,并用“ first_open”事件对一月份的所有用户进行计数

通过以下查询,我当前在a和b两个月都对用户进行了超额计数,因为我没有对这两种事件类型都计算出所有不正常的用户。


    With
    users_first_open as (select 
    user_pseudo_id,
    EXTRACT (Month FROM(DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)))) AS install_month,
    event_name as firstopen
    FROM
        `table.events_*`
    where _TABLE_SUFFIX BETWEEN '20190101'
        AND '20190108' and event_name = "first_open" and 
        EXTRACT (Month FROM(DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)))) = 1
    ),

    user_enagement_next_month as (select 
    user_pseudo_id,
    EXTRACT (Month FROM(DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)))) AS engagement_month,
    event_name as engagament_next_month
    FROM
        `table.events_*`
    where _TABLE_SUFFIX BETWEEN '20190109'
        AND '20190116' and event_name = "user_engagement"
        and EXTRACT (Month FROM(DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)))) = 1), 

    cohort_raw as(
    select 
    user_pseudo_id,
    install_month,
    engagement_month, 
    case when firstopen = "first_open" then 1 else 0 end as cohort_count_first_open, 
    case when engagament_next_month = "user_engagement" then 1 else 0 end as cohort_count_engagement
    from 
    user_enagement_next_month
    full join 
    users_first_open using (user_pseudo_id))--, 


    select
    sum(case when cohort_count_first_open is not null then 1 else 0 end) as users_first_open,
    (select sum(case when cohort_count_engagement is not null then 1 else 0 end) as u_engagement_open from cohort_raw where cohort_count_first_open = 1) as users_engagement_open
    from cohort_raw

我接下来尝试的是以下操作:表2中按userID等分组的“ user_enagement_next_month” 并在结果成立时创建“ first_open”案例和“订婚”案例的总和。在后来的版本中,我加入了查询,以仅统计这两个用户的计数等于2的用户

    --

    With
    users_first_open as (select 
    user_pseudo_id,
    EXTRACT (Month FROM(DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)))) AS install_month,
    event_name as firstopen
    FROM
        `table.events_*`
    where _TABLE_SUFFIX BETWEEN '20190101'
        AND '20190131' and event_name = "first_open" and 
        EXTRACT (Month FROM(DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)))) = 1
    ),

    user_enagement_next_month as (select 
    user_pseudo_id,
    EXTRACT (Month FROM(DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)))) AS engagement_month,
    event_name as engagament_next_month
    FROM
        `table.events_*`
    where _TABLE_SUFFIX BETWEEN '20190201'
        AND '20190228' and event_name = "session_start"
        and EXTRACT (Month FROM(DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)))) = 2
        group by 1,2,3)--,

    --cohort_raw as(
    select 
    user_pseudo_id,
    install_month,
    engagement_month, 
    case when firstopen = "first_open" then 1 else 0 end as cohort_count_first_open, 
    case when engagament_next_month = "session_start" then 1 else 0 end as cohort_count_engagement
    --case when user_pseudo_id is not null then 1 else 0 end as cohort_count_engagement
    from 
    user_enagement_next_month 
    full join 
    users_first_open using (user_pseudo_id)), 

    cohort_agg as (
    select *, cohort_count_first_open+cohort_count_engagement as cohort_sum
    from cohort_raw
    group by 1,2,3,4,5
    order by 6 desc)

    select
    (select count(*) from users_first_open) as cohort_jan,
    (select Sum(cohort_sum) from cohort_agg where cohort_sum = 2) as ret, 
    sum(case when cohort_count_first_open is not null then 1 else 0 end) as users_first_open,
    (select sum(case when cohort_count_engagement is not null then 1 else 0 end) as u_engagement_open from cohort_raw where cohort_count_first_open = 1) as users_engagement_open
    from cohort_agg 

我预计回报率约为20%。目前,我的输出为54%,因为在我的查询中,我要么计算过多,要么计数很少,因为我假设我的联接不起作用。

1 个答案:

答案 0 :(得分:0)

也许我不太清楚你想要什么,但是尝试这个

with

users_first_open as (
    select distinct  -- is there duplicates for one user_id?
        user_pseudo_id,

        extract(
            month from
            timestamp_micros(user_first_touch_timestamp)
        ) as install_month
    from
        `table.events_201901*`  -- longer prefixes generally perform better
    where
        _table_suffix between '01' and '31'
        and event_name = 'first_open'
        and extract(
                month from
                timestamp_micros(user_first_touch_timestamp)
            ) = 1
),

user_enagement_next_month as (
    select distinct
        user_pseudo_id,

        extract(
            month from
            timestamp_micros(user_first_touch_timestamp)
        ) as engagement_month
    from
        `table.events_201902*`  -- longer prefixes generally perform better
    where
        _table_suffix between '01' and '28'
        and event_name = 'user_engagement'
        and extract(
                month from
                timestamp_micros(user_first_touch_timestamp)
            ) = 2
)

select 
    ufo.install_month,
    uenm.engagement_month,
    count(*) as first_open_event_users_cnt,
    count(uenm.user_pseudo_id) as user_engagement_event_users_cnt
from 
    users_first_open as ufo
    left join user_enagement_next_month as uenm
        on ufo.user_pseudo_id = uenm.user_pseudo_id
group by
    1, 2