我的要求是获取具有多个电子邮件ID且类型为1的学生的ID和姓名。
我正在使用类似的查询
select distinct b.id, b.name, b.email, b.type,a.cnt
from (
select id, count(email) as cnt
from (
select distinct id, email
from table1
) c
group by id
) a
join table1 b on a.id = b.id
where b.type=1
order by b.id
请告知我这个版本是否可用或更简单。
Sample data is like:
id name email type
123 AAA abc@xyz.com 1
123 AAA acd@xyz.com 1
123 AAA ayx@xyz.com 3
345 BBB nch@xyz.com 1
345 BBB nch@xyz.com 1
678 CCC iuy@xyz.com 1
Expected Output:
123 AAA abc@xyz.com 1 2
123 AAA acd@xyz.com 1 2
345 BBB nch@xyz.com 1 1
678 CCC iuy@xyz.com 1 1
答案 0 :(得分:2)
您可以使用group by
-> having count()
来满足此要求。
select distinct b.id
, b.name,
, b.email
, b.type
from table1 b
where id in
(select distinct id from table1 group by email, id having count(email) > 1)
and b.type=1
order by b.id
答案 1 :(得分:0)
您可以尝试使用count()函数的解析方式:
SELECT sub.ID, sub.NAME
FROM (SELECT ID, NAME, COUNT (*) OVER (PARTITION BY ID, EMAIL) cnt
FROM raw.crddacia_raw) sub
WHERE sub.cnt > 1 AND sub.TYPE = 1
答案 2 :(得分:0)
我强烈建议您使用窗口功能。但是,Hive不支持count(distinct)
作为窗口函数。有多种解决方法。一个是dense_rank()
的总和:
select id, name, email, type, cnt
from (select t1.*,
(dense_rank() over (partition by id order by email) +
dense_rank() over (partition by id order by email desc)
) as cnt
from table1 t1
) t
where type = 1;
我希望它具有比您的版本更好的性能。但是,值得测试不同版本以查看哪个具有更好的性能(并随时回来告诉其他人哪个更好)。
答案 3 :(得分:0)
另一种使用collect_set
并采用返回数组的大小来计算不同电子邮件的方法。
演示:
--your data example
with table1 as ( --use your table instead of this
select stack(6,
123, 'AAA', 'abc@xyz.com', 1,
123, 'AAA', 'acd@xyz.com', 1,
123, 'AAA', 'ayx@xyz.com', 3,
345, 'BBB', 'nch@xyz.com', 1,
345, 'BBB', 'nch@xyz.com', 1,
678, 'CCC', 'iuy@xyz.com', 1
) as (id, name, email, type )
)
--query
select distinct id, name, email, type,
size(collect_set(email) over(partition by id)) cnt
from table1
where type=1
结果:
id name email type cnt
123 AAA abc@xyz.com 1 2
123 AAA acd@xyz.com 1 2
345 BBB nch@xyz.com 1 1
678 CCC iuy@xyz.com 1 1
这里我们仍然需要DISTINCT,因为分析功能不会像情况345 BBB nch@xyz.com
那样删除重复项。
答案 4 :(得分:0)
这与您的查询非常相似,但在这里我要在初始步骤(内部查询)中过滤数据,以使联接不会发生在较少的数据上
select distinct b.id,b.name,b.email,b.type,intr_table.cnt from table1 orig_table join
(
select a.id,a.type,count(a.email) as cnt from table1 as a where a.type=1 group by a
) intr_table on inter_table.id=orig_table.id,inter_table.type=orig_table.type