我想将我的行列分组为数据块。我想过使用CASE语句,但这不仅看起来很傻,而且还很慢
有关如何改进的提示?
请注意大小不一的块(首先列出前100个,然后是大块100个,然后是大块的1000个,然后是一个5000块和3个其他块的15K)
select
transaction_code
,row_number() over (order by SALES_AMOUNT desc) as rank
,SALES_AMOUNT
,CASE
WHEN rank <=100 THEN to_varchar(rank)
WHEN rank <=200 then '101-200'
WHEN rank <=300 then '201-300'
WHEN rank <=400 then '301-400'
WHEN rank <=500 then '401-500'
WHEN rank <=1000 then '501-1000'
WHEN rank <=1500 then '1001-1500'
WHEN rank <=2000 then '1501-2000'
WHEN rank <=2500 then '2001-2500'
WHEN rank <=3000 then '2501-3000'
WHEN rank <=3500 then '3001-3500'
WHEN rank <=4000 then '3501-4000'
WHEN rank <=4500 then '4001-4500'
WHEN rank <=5000 then '4501-5000'
WHEN rank <=5500 then '5001-5500'
WHEN rank <=6000 then '5501-6000'
WHEN rank <=6500 then '6001-6500'
WHEN rank <=7000 then '6501-7000'
WHEN rank <=7500 then '7001-7500'
WHEN rank <=8000 then '7501-8000'
WHEN rank <=8500 then '8001-8500'
WHEN rank <=9000 then '8501-9000'
WHEN rank <=95000 then '9001-9500'
WHEN rank <=10000 then '9501-10000'
WHEN rank <=15000 then '10001-15000'
WHEN rank <=30000 then '15001-30000'
WHEN rank <=45000 then '30001-45000'
WHEN rank <=60000 then '45001-60000'
ELSE 'Bottom'
END AS "TRANSACTION GROUPS"
答案 0 :(得分:0)
最快的方法是创建一个将排名映射到组名的查找表。您可以使用有状态的JavaScript UDF(仅将地图初始化一次)来完成。
但你也可以在SQL中执行它
从数字到字符串的简单映射
create or replace table rank2group(rank integer, grp string);
你的代码确实很长。
相反,我们可以创建一个函数,对于给定的等级,group_size
和group_base
(group_size
形成的组的数量)生成一个字符串。
请注意,此函数将比您的代码慢,因为它从输入生成一个字符串,但我们只会用它来填充查找表,所以没关系。
create or replace function group_name(rank integer, group_base integer, group_size integer)
returns varchar
as $$
(group_base + 1 + group_size * floor((rank - 1 - group_base) / group_size))
|| '-' ||
(group_base + group_size + group_size * floor((rank - 1 - group_base) / group_size))
$$;
示例输出:
select group_name(101, 100, 100), group_name(1678, 500, 500), group_name(15000, 10000, 5000);
---------------------------+----------------------------+--------------------------------+
GROUP_NAME(101, 100, 100) | GROUP_NAME(1678, 500, 500) | GROUP_NAME(15000, 10000, 5000) |
---------------------------+----------------------------+--------------------------------+
101-200 | 1501-2000 | 10001-15000 |
---------------------------+----------------------------+--------------------------------+
我们将使用Snowflake生成器1 .. 60000
和简化的group_name
语句生成仅映射范围CASE
的值:
创建或替换表rank2group(rank integer,grp string);
insert into rank2group
select rank,
CASE
WHEN rank <=100 THEN to_varchar(rank)
-- groups of size 100, starting at 100
WHEN rank <=500 then group_name(rank, 100, 100)
WHEN rank <=10000 then group_name(rank, 500, 500)
-- groups of size 5000, starting at 10000
WHEN rank <=15000 then group_name(rank, 10000, 5000)
WHEN rank <=60000 then group_name(rank, 15000, 15000)
ELSE 'Bottom'
END AS "TRANSACTION GROUPS"
from (
select row_number() over (order by 1) as rank
from table(generator(rowCount=>60000))
);
要使用它,我们只需加入rank
即可。
请注意,对于outer join
值,您需要ifnull
后跟Bottom
。
例如,使用生成的input
创建呈指数增长的数字:
with input as (
select 1 + (seq8() * seq8() * seq8()) AS rank
from table(generator(rowCount=>50))
)
select input.rank, ifnull(grp, 'Bottom') grp
from input left outer join rank2group on input.rank = rank2group.rank
order by input.rank;
--------+-------------+
RANK | GRP |
--------+-------------+
1 | 1 |
2 | 2 |
9 | 9 |
28 | 28 |
65 | 65 |
126 | 101-200 |
217 | 201-300 |
344 | 301-400 |
513 | 501-1000 |
730 | 501-1000 |
1001 | 1001-1500 |
1332 | 1001-1500 |
1729 | 1501-2000 |
2198 | 2001-2500 |
2745 | 2501-3000 |
3376 | 3001-3500 |
4097 | 4001-4500 |
4914 | 4501-5000 |
5833 | 5501-6000 |
6860 | 6501-7000 |
8001 | 8001-8500 |
9262 | 9001-9500 |
10649 | 10001-15000 |
12168 | 10001-15000 |
13825 | 10001-15000 |
15626 | 15001-30000 |
17577 | 15001-30000 |
19684 | 15001-30000 |
21953 | 15001-30000 |
24390 | 15001-30000 |
27001 | 15001-30000 |
29792 | 15001-30000 |
32769 | 30001-45000 |
35938 | 30001-45000 |
39305 | 30001-45000 |
42876 | 30001-45000 |
46657 | 45001-60000 |
50654 | 45001-60000 |
54873 | 45001-60000 |
59320 | 45001-60000 |
64001 | Bottom |
68922 | Bottom |
74089 | Bottom |
79508 | Bottom |
85185 | Bottom |
91126 | Bottom |
97337 | Bottom |
103824 | Bottom |
110593 | Bottom |
117650 | Bottom |
--------+-------------+
如果您的范围始终为多个或100,则可以将表格缩小100倍,仅存储以00
结尾的值,然后加入例如CEIL(rank)+1
。
但是,您还需要在加入后处理值1..100
,例如IFNULL(grp, IFF(rank <= 100, rank::varchar, 'Bottom'))