需要分析DB列中值的长度,并获取相同长度的值的百分比。
期望结果:
Same length values in COL1 = 70% with LENGTH = 10 chars
这不是“查找最频繁的值并计算其长度”,因为如果我们具有基数高的KEY或ID列,则所有值都会不同。
需要一些快速运行的SQL(首选DB2方言)-不要使数据库引擎过载(数十亿行)。
示例1
COL1 (VARCHAR 10)
------------------
X01
X02
X03
X04
X05
结果:
100%, 3
示例2
COL1(VARCHAR 20)
-------------------------
New York
London
Los Angeles
Paris
San Francisco
结果:
20%, 5
(or 20%, 13 - does not matter because values are different)
答案 0 :(得分:1)
尝试一下:
select concat(cast(rnk1 as float)/cast (totalcol1 as float)*100,'%'), col1length
from (
select *
, row_number () over (partition by col1length order by col1length) rnk1
from (
select length(col1) as col1length
,(select count(col1) from test) as totalcol1
from test)t1
order by rnk1 desc
FETCH FIRST 1 ROWS ONLY)t2
测试结果:
答案 1 :(得分:1)
使用GROUP BY GROUPING SETS
运算符处理任意数量列的单个SELECT语句。下面的示例假定常量是相应长度(varchar_col)的结果。
with tab as (
select
length(a) a
, length(b) b
, count(1) cnt
, grouping(length(a)) a_grp
, grouping(length(b)) b_grp
from table(values
('X01', 'New York')
, ('X02', 'London')
, ('X03', 'Los Angeles')
, ('X04', 'Paris')
, ('X05', 'San Francisco')
) t (a, b)
group by grouping sets ((length(a)), (length(b)), ())
)
, row_count as (select cnt from tab where a_grp + b_grp = 2)
, top as (
select a, b, cnt, rownumber() over(partition by a_grp, b_grp order by cnt desc) rn_
from tab
where a_grp + b_grp = 1 -- number of columns - 1
)
select a, b, cnt, 100*cnt/nullif((select cnt from row_count), 0) pst
from top
where rn_=1;
A B CNT PST
-- -- --- ---
3 - 5 100
- 5 1 20