Counting Distinct and Unique user logins per day in postgresql

时间:2017-08-04 12:37:53

标签: postgresql

I am trying to count how many distinct and unique user_IDs have logged in for the first time to the website. Below is the SQL query that counts distinct user_IDs based on the condition that its their first_hit at the website. But I am not sure whether if the same user is logging in different days. One user login should be counted only once(unique).

The first query that I wrote:

select
    --'Jan 2017' as month,
    to_char(first_hit_at::date,'dd-mm-yyyy') as date,
    count( distinct a.user_id ) as unique_user_logins_in_month
from
      stg_marketing.ga_sessions a
where
    a.first_hit_at >('2017-01-01 00:00:00.000')
    and a.first_hit_at <('2017-02-01 00:00:00.000')
    and user_login_state = 'true' 
    group by 1
    order by 1

Second improved query:

select
    --'Jan 2017' as month,
    to_char(first_hit_at::date,'dd-mm-yyyy') as date,
    count( distinct a.user_id ) as unique_user_logins_in_month
from
    stg_marketing.ga_sessions a
where
    a.first_hit_at >('2017-01-01 00:00:00.000')
    and a.first_hit_at <('2017-02-01 00:00:00.000')
    and user_login_state = 'true' 
    and last_hit_at::date > first_hit_at::date 
    group by 1
    order by 1

Result of query one:

date         unique_user_logins_in_month
01-01-2017   7008
02-01-2017   11023
03-01-2017   10318
04-01-2017   10091
05-01-2017   8726

Result of query two:

  date        unique_user_logins_in_month
  01-01-2017  97
  02-01-2017  96
  03-01-2017  62
  04-01-2017  61
  05-01-2017  69

I am not sure if both are queries are correct or second one is more correct. Thanks

1 个答案:

答案 0 :(得分:0)

  

第一个登录了多少个不同且唯一的user_ID   到网站的时间。

如果用户从未登录过,则首次登录。
您可以使用NOT EXISTS运算符(这称为ANTI JOIN)找到此类用户的日志条目:

select *
from
      stg_marketing.ga_sessions a
where
    a.first_hit_at >('2017-01-01 00:00:00.000')
    and a.first_hit_at <('2017-02-01 00:00:00.000')
    and user_login_state = 'true' 
    AND NOT EXISTS (
        select * from stg_marketing.ga_sessions b
        where a.user_id = b.user_id
          and b.first_hit_at < a.first_hit_at
    )

现在只需对上述查询的结果进行聚合

select 
     to_char(first_hit_at::date,'dd-mm-yyyy') as date,
     count( distinct a.user_id ) as unique_user_logins_in_month
from
      stg_marketing.ga_sessions a
where
    a.first_hit_at >('2017-01-01 00:00:00.000')
    and a.first_hit_at <('2017-02-01 00:00:00.000')
    and user_login_state = 'true' 
    AND NOT EXISTS (
        select * from stg_marketing.ga_sessions b
        where a.user_id = b.user_id
          and b.first_hit_at < a.first_hit_at
    )
group by to_char(first_hit_at::date,'dd-mm-yyyy')