是否可以执行以下操作:
select
avg(count(distinct user_id))
over (partition by some_date) as average_users_per_day
from user_activity
group by user_type
(特别是partition by
列some_date
不在group by
列中)
我要的想法是:每天按用户类型划分的平均用户数。
我知道如何使用子查询(请参见下文),但是我想知道是否有一种仅使用over (partition by ...)
和group by
的好方法。
通过阅读this answer,我的理解(如果我错了,请纠正我)是以下查询:
select
avg(count(distinct a)) over (partition by b)
from foo
group by b
可以等效地扩展为:
select
avg(count_distinct_a)
from (
select
b,
count(distinct a) as count_distinct_a
from foo
group by b
)
group by b
然后,我可以对其进行一些调整以实现我想要的:
select
avg(count_distinct_user_id) as average_users_per_day
from (
select
user_type,
count(distinct user_id) as count_distinct_user_id
from user_activity
group by user_type, some_date
)
group by user_type
(值得注意的是,内部group by user_type, some_date
与外部group by user_type
不同)
我希望能够告诉partition by
-group by
交互使用“ sub-group-by”作为窗口部分。请让我知道我对partition by
/ group by
的理解是否完全不正确。
编辑:一些示例数据和所需的输出。
源表:
+---------+-----------+-----------+
| user_id | user_type | some_date |
+---------+-----------+-----------+
| 1 | a | 1 |
| 1 | a | 2 |
| 2 | a | 1 |
| 3 | a | 2 |
| 3 | a | 2 |
| 4 | b | 2 |
| 5 | b | 1 |
| 5 | b | 3 |
| 5 | b | 3 |
| 6 | c | 1 |
| 7 | c | 1 |
| 8 | c | 4 |
| 9 | c | 2 |
| 9 | c | 3 |
| 9 | c | 4 |
+---------+-----------+-----------+
示例中间表(用于推理):
+-----------+-----------+---------------------+
| user_type | some_date | distinct_user_count |
+-----------+-----------+---------------------+
| a | 1 | 2 |
| a | 2 | 2 |
| b | 1 | 1 |
| b | 2 | 1 |
| b | 3 | 1 |
| c | 1 | 2 |
| c | 2 | 1 |
| c | 3 | 1 |
| c | 4 | 2 |
+-----------+-----------+---------------------+
SQL是:select user_type, some_date, count(distinct user_id) from user_activity group by user_type, some_date
。
所需结果:
+-----------+---------------------+
| user_type | average_daily_users |
+-----------+---------------------+
| a | 2 |
| b | 1 |
| c | 1.5 |
+-----------+---------------------+