Question

需要分析DB列中值的长度，并获取相同长度的值的百分比。

期望结果：

            Same length values in COL1 = 70%  with LENGTH = 10 chars

这不是“查找最频繁的值并计算其长度”，因为如果我们具有基数高的KEY或ID列，则所有值都会不同。

需要一些快速运行的SQL（首选DB2方言）-不要使数据库引擎过载（数十亿行）。

示例1

         COL1 (VARCHAR 10) 
         ------------------
                     X01   
                     X02   
                     X03   
                     X04   
                     X05

结果：

            100%, 3

示例2

           COL1(VARCHAR 20)
         -------------------------
                    New York
                    London
                    Los Angeles
                    Paris
                    San Francisco

结果：

            20%, 5 
           (or 20%, 13 - does not matter because values are different)

Answer 1

尝试一下：

select concat(cast(rnk1 as float)/cast (totalcol1 as float)*100,'%'), col1length
from (
select *
, row_number () over (partition by col1length order by col1length) rnk1
from (
select length(col1) as col1length
,(select count(col1) from test) as totalcol1
from test)t1
order by rnk1 desc
FETCH FIRST 1 ROWS ONLY)t2

测试结果：

post

Answer 2

使用GROUP BY GROUPING SETS运算符处理任意数量列的单个SELECT语句。下面的示例假定常量是相应长度（varchar_col）的结果。

with tab as (
select
  length(a) a
, length(b) b
, count(1) cnt
, grouping(length(a)) a_grp
, grouping(length(b)) b_grp
from table(values
  ('X01', 'New York')     
, ('X02', 'London')       
, ('X03', 'Los Angeles')  
, ('X04', 'Paris')        
, ('X05', 'San Francisco')
) t (a, b)
group by grouping sets ((length(a)), (length(b)), ())
)
, row_count as (select cnt from tab where a_grp + b_grp = 2)
, top as (
select a, b, cnt, rownumber() over(partition by a_grp, b_grp order by cnt desc) rn_
from tab
where a_grp + b_grp  = 1 -- number of columns - 1
)
select a, b, cnt, 100*cnt/nullif((select cnt from row_count), 0) pst
from top
where rn_=1;

 A  B CNT PST
-- -- --- ---
 3  -   5 100
 -  5   1  20

SQL：如何确定DB列中最频繁的数据长度？

2 个答案: