蜂巢-超过(按...划分)的列不在分组依据中

时间:2018-08-21 20:57:56

标签: hive aggregate-functions hiveql window-functions

是否可以执行以下操作:

select
  avg(count(distinct user_id))
    over (partition by some_date) as average_users_per_day
from user_activity
group by user_type

(特别是partition bysome_date不在group by列中)

我要的想法是:每天按用户类型划分的平均用户数

我知道如何使用子查询(请参见下文),但是我想知道是否有一种仅使用over (partition by ...)group by的好方法。


注意:

通过阅读this answer,我的理解(如果我错了,请纠正我)是以下查询:

select
  avg(count(distinct a)) over (partition by b)
from foo
group by b

可以等效地扩展为:

select
  avg(count_distinct_a)
from (
  select
    b,
    count(distinct a) as count_distinct_a
  from foo
  group by b
)
group by b

然后,我可以对其进行一些调整以实现我想要的:

select
  avg(count_distinct_user_id) as average_users_per_day
from (
  select
    user_type,
    count(distinct user_id) as count_distinct_user_id
  from user_activity
  group by user_type, some_date
)
group by user_type

(值得注意的是,内部group by user_type, some_date与外部group by user_type不同)

我希望能够告诉partition by-group by交互使用“ sub-group-by”作为窗口部分。请让我知道我对partition by / group by的理解是否完全不正确。


编辑:一些示例数据和所需的输出。

源表:

+---------+-----------+-----------+
| user_id | user_type | some_date |
+---------+-----------+-----------+
| 1       | a         | 1         |
| 1       | a         | 2         |
| 2       | a         | 1         |
| 3       | a         | 2         |
| 3       | a         | 2         |
| 4       | b         | 2         |
| 5       | b         | 1         |
| 5       | b         | 3         |
| 5       | b         | 3         |
| 6       | c         | 1         |
| 7       | c         | 1         |
| 8       | c         | 4         |
| 9       | c         | 2         |
| 9       | c         | 3         |
| 9       | c         | 4         |
+---------+-----------+-----------+

示例中间表(用于推理):

+-----------+-----------+---------------------+
| user_type | some_date | distinct_user_count |
+-----------+-----------+---------------------+
| a         | 1         | 2                   |
| a         | 2         | 2                   |
| b         | 1         | 1                   |
| b         | 2         | 1                   |
| b         | 3         | 1                   |
| c         | 1         | 2                   |
| c         | 2         | 1                   |
| c         | 3         | 1                   |
| c         | 4         | 2                   |
+-----------+-----------+---------------------+

SQL是:select user_type, some_date, count(distinct user_id) from user_activity group by user_type, some_date

所需结果

+-----------+---------------------+
| user_type | average_daily_users |
+-----------+---------------------+
| a         | 2                   |
| b         | 1                   |
| c         | 1.5                 |
+-----------+---------------------+

0 个答案:

没有答案