计算在线应用的重复用户

时间:2016-12-23 02:31:29

标签: sql

为人们开发内部Web应用程序以请求报告。计划按月和周计算重复用户,以衡量产品的影响。

按月重复用户:已请求m1报告且已返回m2的用户。如果有人在m1中请求了报告但又在m3中请求了这些报告,则不会将其视为每月重复用户。同样的事情适用于周,季度和年份。

该表包含许多列,但关键的是:user_id,action_date

这是我使用的查询:

WITH t AS (SELECT user_id
     ,date_trunc('month', action_date) AS month
     ,count(*) AS reports
     ,lag(date_trunc('month', action_date)) OVER (PARTITION BY  user_id
                                       ORDER BY date_trunc('month', action_date)) 
      = date_trunc('month', action_date) - interval '1 month'
        OR NULL AS repeat_transaction
           FROM   a
   WHERE  action_date >= '2016-01-01'::date
--AND    action_date <= '2016-12-01'::date -- time range of interest.
GROUP  BY 1, 2
)
SELECT month
    ,count(*) AS num_users
    ,count(repeat_transaction) AS repeat_users
 FROM   t
 GROUP  BY 1
 ORDER  BY 1;

以下是查询的输出:

enter image description here

我通过excel手动快速检查以验证值。这是实际数字enter image description here

因此很明显,11月份的查询号码已经关闭,我不知道为什么。这里的任何帮助将非常感谢!谢谢!

1 个答案:

答案 0 :(得分:0)

如果没有样本数据,那么在尝试回答这个问题时会有一些猜测因素,并且在没有完全相同的数据的情况下,没有任何方法可靠地重现您的问题。在一个简单的测试中(见下文)我可以建议你删除你在cte中拥有的组,但这并不是必需的。我也放弃了cte,因为在这里使用它没有技术优势。

请参阅此SQL Fiddle

CREATE TABLE MyTable    
    ("user_id" int, "action_date" timestamp)
;

INSERT INTO MyTable 
    ("user_id", "action_date")
VALUES
    (1, '2016-09-05 00:00:00'),
    (3, '2016-09-05 00:00:00'),

    (1, '2016-10-05 00:00:00'), -- repeat
    (2, '2016-10-05 00:00:00'),

    (1, '2016-11-04 00:00:00'), -- repeat
    (2, '2016-11-04 00:00:00'), -- repeat
    (3, '2016-11-04 00:00:00'), 
    (4, '2016-11-04 00:00:00'),

    (1, '2016-12-04 00:00:00'), -- repeat
    (2, '2016-12-04 00:00:00')  -- repeat
;

查询1

    SELECT
          month
        , count(*) AS num_users
        , count(repeat_transaction) AS repeat_users
    FROM   (
              SELECT user_id
                 , date_trunc('month', action_date) AS month
                 , lag(date_trunc('month', action_date)) OVER (PARTITION BY  user_id
                                                   ORDER BY date_trunc('month', action_date)) 
                    = date_trunc('month', action_date) - interval '1 month'
                    OR NULL AS repeat_transaction
               FROM   MyTable
               WHERE  action_date >= '2016-01-01'::date
          ) t
    GROUP  BY 1
    ORDER  BY 1

<强> Results

|                       month | num_users | repeat_users |
|-----------------------------|-----------|--------------|
| September, 01 2016 00:00:00 |         2 |            0 |
|   October, 01 2016 00:00:00 |         2 |            1 |
|  November, 01 2016 00:00:00 |         4 |            2 |
|  December, 01 2016 00:00:00 |         2 |            2 |