I am trying to count how many distinct and unique user_IDs have logged in for the first time to the website. Below is the SQL query that counts distinct user_IDs based on the condition that its their first_hit at the website. But I am not sure whether if the same user is logging in different days. One user login should be counted only once(unique).
The first query that I wrote:
select
--'Jan 2017' as month,
to_char(first_hit_at::date,'dd-mm-yyyy') as date,
count( distinct a.user_id ) as unique_user_logins_in_month
from
stg_marketing.ga_sessions a
where
a.first_hit_at >('2017-01-01 00:00:00.000')
and a.first_hit_at <('2017-02-01 00:00:00.000')
and user_login_state = 'true'
group by 1
order by 1
Second improved query:
select
--'Jan 2017' as month,
to_char(first_hit_at::date,'dd-mm-yyyy') as date,
count( distinct a.user_id ) as unique_user_logins_in_month
from
stg_marketing.ga_sessions a
where
a.first_hit_at >('2017-01-01 00:00:00.000')
and a.first_hit_at <('2017-02-01 00:00:00.000')
and user_login_state = 'true'
and last_hit_at::date > first_hit_at::date
group by 1
order by 1
Result of query one:
date unique_user_logins_in_month
01-01-2017 7008
02-01-2017 11023
03-01-2017 10318
04-01-2017 10091
05-01-2017 8726
Result of query two:
date unique_user_logins_in_month
01-01-2017 97
02-01-2017 96
03-01-2017 62
04-01-2017 61
05-01-2017 69
I am not sure if both are queries are correct or second one is more correct. Thanks
答案 0 :(得分:0)
第一个登录了多少个不同且唯一的user_ID 到网站的时间。
如果用户从未登录过,则首次登录。
您可以使用NOT EXISTS运算符(这称为ANTI JOIN)找到此类用户的日志条目:
select *
from
stg_marketing.ga_sessions a
where
a.first_hit_at >('2017-01-01 00:00:00.000')
and a.first_hit_at <('2017-02-01 00:00:00.000')
and user_login_state = 'true'
AND NOT EXISTS (
select * from stg_marketing.ga_sessions b
where a.user_id = b.user_id
and b.first_hit_at < a.first_hit_at
)
现在只需对上述查询的结果进行聚合
select
to_char(first_hit_at::date,'dd-mm-yyyy') as date,
count( distinct a.user_id ) as unique_user_logins_in_month
from
stg_marketing.ga_sessions a
where
a.first_hit_at >('2017-01-01 00:00:00.000')
and a.first_hit_at <('2017-02-01 00:00:00.000')
and user_login_state = 'true'
AND NOT EXISTS (
select * from stg_marketing.ga_sessions b
where a.user_id = b.user_id
and b.first_hit_at < a.first_hit_at
)
group by to_char(first_hit_at::date,'dd-mm-yyyy')