假设我有下表,我如何按ID分组,并获得每列中最常见的值 附: table很大,我需要为很多列做这个
ID Col1 Col2 Col3....
1 A null
1 A X
1 B null
1 A Y
2 C X
2 C Y
2 A Y
3 B Z
3 A Z
3 A Z
3 B X
3 B Y
预期结果:
ID Col1 Col2 Col3....
1 A null
2 C Y
3 B Z
答案 0 :(得分:4)
这是一种方法,使用分析函数和keep
:
select id,
min(col1) keep(dense_rank first order by cnt_col1 desc) as col1_mode,
min(col2) keep(dense_rank first order by cnt_col2 desc) as col2_mode,
min(col3) keep(dense_rank first order by cnt_col3 desc) as col3_mode
from (select id,
count(*) over (partition by id, col1) as cnt_col1,
count(*) over (partition by id, col2) as cnt_col2,
count(*) over (partition by id, col3) as cnt_col3
from t
) t
group by id;
最常见的值在统计中称为“模式”,Oracle提供了计算此功能的函数。因此,更简单的方法是使用stats_mode()
:
select id,
stats_mode(col1) as mode_col1,
stats_mode(col2) as mode_col2,
stats_mode(col3) as mode_col3
from table t
group by id;
编辑:
如评论中所述,stats_mode()
不会计算NULL
个值。解决此问题的最简单方法是找到一些不在数据中的值并执行:
select id,
stats_mode(coalesce(col1, '<null>')) as mode_col1,
stats_mode(coalesce(col2, '<null>')) as mode_col2,
stats_mode(coalesce(col3, '<null>')) as mode_col3
from table t
group by id;
另一种方法是恢复第一种方法或类似方法:
select id,
(case when sum(case when col1 = mode_col1 then 1 else 0 end) >= sum(case when col1 is null then 1 else 0 end)
then mode_col1
else NULL
end) as mode_col1,
(case when sum(case when col2 = mode_col2 then 1 else 0 end) >= sum(case when col2 is null then 1 else 0 end)
then mode_col2
else NULL
end) as mode_col2,
(case when sum(case when col3 = mode_col13 then 1 else 0 end) >= sum(case when col3 is null then 1 else 0 end)
then mode_col3
else NULL
end) as mode_col3
from (select t.*,
stats_mode(col1) over (partition by id) as mode_col1,
stats_mode(col2) over (partition by id) as mode_col2,
stats_mode(col3) over (partition by id) as mode_col3
from table t
) t
group by id;