尝试将群组过滤为仅包含会话超过5分钟的参与者的群组。
我当前的查询:
select
U.session_id,
U.session_date,
U.participant_duration
U.email
from data.usage U
left outer join
(select
distinct M.session_id
from data.usage M
where email like '%gmail.com%'
and data_date >= '20180101'
and name in
(
select
lower(name)
from data.users
where role like 'Person%'
and isactive = TRUE
and data_date = '20180412'
))M
on U.session_id = M.session_id
一旦数据出来......
session_id session_date participant_duration email
143 20180401 0.4 huy@gmail.com
143 20180401 1.5 t@gmail.com
143 20180401 1.6 att@gmail.com
143 20180401 2.3 m@gmail.com
124 20180401 5.6 p@gmail.com
124 20180401 3.2 alex@gmail.com
165 20180401 4.1 jeff@gmail.com
165 20180401 3.1 nader@gmail.com
我想用一个where子句对其进行过滤,该子句只返回包含participant_duration >= 5
的至少1条记录的组。
喜欢这样的东西:
group by session_id having participant_duration >= 5
这远远不够吗?
答案 0 :(得分:0)
是..您使用group by
和having
时有正确的想法。
group by session_id
having sum(cast(participant_duration >= 5 as int)) >= 1
此外,您的查询可以简化为
select *
from (select U.session_id,U.session_date,U.participant_duration,U.email,
SUM(cast(U.participant_duration >= 5 as int)) OVER(PARTITION BY U.session_id) as dur_gt_5
from data.usage U
join data.users M on U.session_id = M.session_id and U.name=lower(M.name)
where M.role like 'Person%' and M.isactive = TRUE and M.data_date = '20180412'
and U.email like '%gmail.com%' and U.data_date >= '20180101'
) t
where dur_gt_5>=1
答案 1 :(得分:0)
如果您在session_id 字段中使用分组,则需要在选择查询的其他字段中使用聚合函数(如sum,min,max等)。
我认为 session_id,session_date 对于记录是相同的,所以如果您不想使用分组(或)中使用这两个字段分组中的> session_date 您需要使用此字段中的任何聚合函数,例如 max(session_Date)等。
对participant_duration使用sum aggregate函数,然后在having子句中使用partition_duration来仅过滤掉值大于5的记录。
只有select语句中剩下的字段是电子邮件,它不在group by子句中,因此我使用 max aggregate function 只获取电子邮件字段的一个值。
分组中的session_date: -
select
U.session_id,
U.session_date,
sum(U.participant_duration) participant_duration,
max(U.email) email
from data.usage U
left outer join
(select
distinct M.session_id
from data.usage M
where email like '%gmail.com%'
and data_date >= '20180101'
and name in
(
select
lower(name)
from data.users
where role like 'Person%'
and isactive = TRUE
and data_date = '20180412'
))M
on U.session_id = M.session_id
group by U.session_id,U.session_date
having sum(cast(participant_duration >= 5 as int)) >= 1;
<强>(或)强>
session_date不在group by子句中: -
select
U.session_id,
max(U.session_date) session_date,
sum(U.participant_duration) participant_duration,
max(U.email) email
from data.usage U
left outer join
(select
distinct M.session_id
from data.usage M
where email like '%gmail.com%'
and data_date >= '20180101'
and name in
(
select
lower(name)
from data.users
where role like 'Person%'
and isactive = TRUE
and data_date = '20180412'
))M
on U.session_id = M.session_id
group by U.session_id
having sum(cast(participant_duration >= 5 as int)) >= 1;