我想找到每个组中出现最多的值。
我尝试使用top(k)(column),但出现以下错误: 列类不在聚合函数下,也不在GROUP BY中。
例如: 如果我的表test_date具有column(pid,value)
pid, value
----------
1,a
1,b
1,a
1,c
我想要结果:
pid, value
----------
1,a
我尝试了SELECT pid,top(1)(value) top_value FROM test_data group by pid
I get the error:
Column value is not under aggregate function and not in GROUP BY
我也尝试过使用anyHeavy()
,但是它只适用于出现一半以上情况的值
答案 0 :(得分:2)
此查询应为您提供帮助:
SELECT
pid,
/*
Decompose the query in parts:
1. groupArray((value, count)): convert the group of rows with the same 'pid' to the array of tuples (value, count)
2. arrayReverseSort: make reverse sorting by 'count' ('x.2' is 'count')
3. [1].1: take the 'value' from the first item of the sorted array
*/
arrayReverseSort(x -> x.2, groupArray((value, count)))[1].1 AS value
FROM
(
SELECT
pid,
value,
count() AS count
FROM test_date
GROUP BY
pid,
value
)
GROUP BY pid
ORDER BY pid ASC
答案 1 :(得分:0)
SELECT pid,topK(1)(value) top_value FROM test_data group by pid