使用子查询和分组来计算每个国家/地区的DAU平均值

时间:2018-01-03 12:34:38

标签: mysql sql analytics

我正在尝试计算每个国家/地区的DAU平均值,为期1个月。查询的工作是:

  1. 识别唯一身份用户
  2. 查找最后登录的所有用户 一个月
  3. 将他们分成个人日
  4. 将他们分成他们的 各国
  5. 计算每个国家/地区的平均值。
  6. 到目前为止,我已经完成了第1,2,3和4步,但最后一步证明是棘手的。

    该查询应该首先计算子查询,在该子查询中计算上个月打开应用程序的活跃用户数,然后将它们分组为天和国家。 在此之后,它应该使用它在子查询中计算的所有30天数据来计算每个国家的平均DAU。 结果将是一个国家列表及其平均DAU。

    到目前为止,

    查询看起来像这样:

    SELECT Country, AVG(User_ID)
    FROM usersession
    WHERE User_ID IN
        (SELECT count(distinct us.User_ID)
         FROM usersession us
         WHERE Opened > current_timestamp - interval 1 month
         GROUP BY DAY(Opened), Country)
    GROUP BY Country ORDER BY Country;
    

    子查询执行步骤1,2,3,4,但子查询外部的辅助查询不仅仅按预期工作。

    表如下(仅是相关信息的简短示例):

    ID    |  UserID  | Opened              | Country
    -----------------------------------------------
    233231          1   2017-11-20 08:00:00      NA
    223214          2   2017-11-20 08:53:00      DK
    

    预期结果(总共约230个国家):

    Country |  Average  
    ------------------
         NA    150354
         DK     60345
         FI     50242
    

    实际结果:

    +---------+--------------+
    | Country | AVG(User_ID) |
    +---------+--------------+
    | NULL    |  804397.7297 |
    |         |  746046.7500 |
    | BR      |  893252.0000 |
    | GB      |  935599.0000 |
    | RU      |  993311.0000 |
    | US      |  735568.0000 |
    +---------+--------------+
    

1 个答案:

答案 0 :(得分:0)

我认为这就是你想要的:

select
    country,
    sum(number_of_users) / count(distinct day_of_month) as daily_average_users
from
    (
        select 
           country,
           day(opened)             as day_of_month,
           count(distinct user_id) as number_of_users
        from
           user_session
        where
           opened > current_timestamp - interval 1 month
        group by
           country,
           day_of_month
    ) x
group by 
    country
order by 
    country;

我在MySQL 5.7上测试了这个:

create table user_session
(
    id       int,
    user_id  int,
    opened   timestamp,
    country  varchar(2)
);

insert into user_session (id, user_id, opened, country) values ( 1, 100, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 2, 100, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 3, 100, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 4, 100, '2017-12-21 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 5, 100, '2017-12-22 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 6, 200, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 7, 300, '2017-12-21 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 8, 400, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 9, 500, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values (10, 600, '2017-12-20 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (11, 600, '2017-12-21 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (12, 700, '2017-12-20 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (13, 800, '2017-12-20 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (14, 800, '2017-12-21 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (15, 800, '2017-12-21 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (16, 900, '2017-12-20 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (17, 900, '2017-12-20 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (18, 900, '2017-12-22 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (19, 900, '2017-12-22 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (19, 1000, '2017-12-22 08:00:00', 'DK');

结果:

+---------+---------------------+
| country | daily_average_users |
+---------+---------------------+
| DK      |              2.6667 |
| NA      |              2.3333 |
+---------+---------------------+
2 rows in set (0.00 sec)

要使这成为适当的日平均值,您需要在数据中表示每月的每一天(否则平均值超过所代表的天数)。如果情况并非如此,那么我们需要计算所考虑期间的天数。