无论如何要使排名的案例陈述变得更聪明?

时间:2018-06-05 22:39:44

标签: sql snowflake

我想将我的行列分组为数据块。我想过使用CASE语句,但这不仅看起来很傻,而且还很慢

有关如何改进的提示?

请注意大小不一的块(首先列出前100个,然后是大块100个,然后是大块的1000个,然后是一个5000块和3个其他块的15K)

select   
  transaction_code
  ,row_number() over (order by SALES_AMOUNT desc) as rank
  ,SALES_AMOUNT
  ,CASE 
    WHEN rank <=100 THEN to_varchar(rank)
    WHEN rank <=200 then '101-200'
    WHEN rank <=300 then '201-300'
    WHEN rank <=400 then '301-400'
    WHEN rank <=500 then '401-500'
    WHEN rank <=1000 then '501-1000'
    WHEN rank <=1500 then '1001-1500'
    WHEN rank <=2000 then '1501-2000'
    WHEN rank <=2500 then '2001-2500'
    WHEN rank <=3000 then '2501-3000'
    WHEN rank <=3500 then '3001-3500'
    WHEN rank <=4000 then '3501-4000'
    WHEN rank <=4500 then '4001-4500'
    WHEN rank <=5000 then '4501-5000'
    WHEN rank <=5500 then '5001-5500'
    WHEN rank <=6000 then '5501-6000'
    WHEN rank <=6500 then '6001-6500'
    WHEN rank <=7000 then '6501-7000'
    WHEN rank <=7500 then '7001-7500'
    WHEN rank <=8000 then '7501-8000'
    WHEN rank <=8500 then '8001-8500'
    WHEN rank <=9000 then '8501-9000'
    WHEN rank <=95000 then '9001-9500'
    WHEN rank <=10000 then '9501-10000'
    WHEN rank <=15000 then '10001-15000'
    WHEN rank <=30000 then '15001-30000'
    WHEN rank <=45000 then '30001-45000'
    WHEN rank <=60000 then '45001-60000'
    ELSE 'Bottom'
   END AS "TRANSACTION GROUPS"

1 个答案:

答案 0 :(得分:0)

最快的方法是创建一个将排名映射到组名的查找表。您可以使用有状态的JavaScript UDF(仅将地图初始化一次)来完成。

但你也可以在SQL中执行它

表定义

从数字到字符串的简单映射

create or replace table rank2group(rank integer, grp string);

UDF生成组名

你的代码确实很长。

相反,我们可以创建一个函数,对于给定的等级,group_sizegroup_basegroup_size形成的组的数量)生成一个字符串。

请注意,此函数将比您的代码慢,因为它从输入生成一个字符串,但我们只会用它来填充查找表,所以没关系。

create or replace function group_name(rank integer, group_base integer, group_size integer)
returns varchar
as $$
  (group_base + 1 + group_size * floor((rank - 1 - group_base) / group_size))
  || '-' || 
  (group_base + group_size + group_size * floor((rank - 1 - group_base) / group_size))
$$;

示例输出:

select group_name(101, 100, 100), group_name(1678, 500, 500), group_name(15000, 10000, 5000);
---------------------------+----------------------------+--------------------------------+
 GROUP_NAME(101, 100, 100) | GROUP_NAME(1678, 500, 500) | GROUP_NAME(15000, 10000, 5000) |
---------------------------+----------------------------+--------------------------------+
 101-200                   | 1501-2000                  | 10001-15000                    |
---------------------------+----------------------------+--------------------------------+

表格数据生成

我们将使用Snowflake生成器1 .. 60000和简化的group_name语句生成仅映射范围CASE的值:

创建或替换表rank2group(rank integer,grp string);

insert into rank2group
select rank,
CASE 
    WHEN rank <=100 THEN to_varchar(rank)
    -- groups of size 100, starting at 100
    WHEN rank <=500 then group_name(rank, 100, 100)                                                    
    WHEN rank <=10000 then group_name(rank, 500, 500)
    -- groups of size 5000, starting at 10000
    WHEN rank <=15000 then group_name(rank, 10000, 5000) 
    WHEN rank <=60000 then group_name(rank, 15000, 15000)
    ELSE 'Bottom'
END AS "TRANSACTION GROUPS"
from (
    select row_number() over (order by 1) as rank
    from table(generator(rowCount=>60000))
);

用法

要使用它,我们只需加入rank即可。 请注意,对于outer join值,您需要ifnull后跟Bottom。 例如,使用生成的input创建呈指数增长的数字:

with input as (
  select 1 + (seq8() * seq8() * seq8()) AS rank
  from table(generator(rowCount=>50))
)
select input.rank, ifnull(grp, 'Bottom') grp
from input left outer join rank2group on input.rank = rank2group.rank
order by input.rank;
--------+-------------+
  RANK  |     GRP     |
--------+-------------+
 1      | 1           |
 2      | 2           |
 9      | 9           |
 28     | 28          |
 65     | 65          |
 126    | 101-200     |
 217    | 201-300     |
 344    | 301-400     |
 513    | 501-1000    |
 730    | 501-1000    |
 1001   | 1001-1500   |
 1332   | 1001-1500   |
 1729   | 1501-2000   |
 2198   | 2001-2500   |
 2745   | 2501-3000   |
 3376   | 3001-3500   |
 4097   | 4001-4500   |
 4914   | 4501-5000   |
 5833   | 5501-6000   |
 6860   | 6501-7000   |
 8001   | 8001-8500   |
 9262   | 9001-9500   |
 10649  | 10001-15000 |
 12168  | 10001-15000 |
 13825  | 10001-15000 |
 15626  | 15001-30000 |
 17577  | 15001-30000 |
 19684  | 15001-30000 |
 21953  | 15001-30000 |
 24390  | 15001-30000 |
 27001  | 15001-30000 |
 29792  | 15001-30000 |
 32769  | 30001-45000 |
 35938  | 30001-45000 |
 39305  | 30001-45000 |
 42876  | 30001-45000 |
 46657  | 45001-60000 |
 50654  | 45001-60000 |
 54873  | 45001-60000 |
 59320  | 45001-60000 |
 64001  | Bottom      |
 68922  | Bottom      |
 74089  | Bottom      |
 79508  | Bottom      |
 85185  | Bottom      |
 91126  | Bottom      |
 97337  | Bottom      |
 103824 | Bottom      |
 110593 | Bottom      |
 117650 | Bottom      |
--------+-------------+

可能的优化

如果您的范围始终为多个或100,则可以将表格缩小100倍,仅存储以00结尾的值,然后加入例如CEIL(rank)+1

但是,您还需要在加入后处理值1..100,例如IFNULL(grp, IFF(rank <= 100, rank::varchar, 'Bottom'))