我有以下数据:
name device operating browser
A mob l c
A mob l b
A mob l b
A web w b
B web w c
B web w c
B mob w c
B web l b
我想为每一列中的每个名称找到最通用的值,因此结果将如下所示:
name device operating browser
A mob l b
B web w c
我该如何实现?谢谢!
答案 0 :(得分:0)
可能会有所帮助。 但是请注意,使用子查询并不是很好。
SELECT
a.name,
(SELECT b.device FROM YOUR_TABLE_NAME b WHERE b.name = a.name GROUP BY device ORDER BY COUNT(b.device) DESC LIMIT 1) AS device,
(SELECT c.operating FROM YOUR_TABLE_NAME c WHERE c.name = a.name GROUP BY operating ORDER BY COUNT(c.operating) DESC LIMIT 1) AS operating,
(SELECT d.browser FROM YOUR_TABLE_NAME d WHERE d.name = a.name GROUP BY browser ORDER BY COUNT(d.browser) DESC LIMIT 1) AS browser
FROM YOUR_TABLE_NAME AS a
GROUP BY a.name
答案 1 :(得分:0)
对于Hive 0.11+,您可以使用rank
之类的窗口函数:
select name, device, operating, browser
from (
select *, rank() over (partition by name order by cnt desc) as rnk
from (
select name, device, operating, browser, count(*) as cnt
from yourtable
group by name, device, operating, browser
) t
) t
where rnk = 1
逐步:
注意:如果特定名称之间有平局,它将返回所有具有相同计数编号的行。