Hive,使用变量滞后排序行

时间:2016-05-19 08:37:40

标签: sql hive window analytics lag

我有以下的配置表:

product |  price
   A    |   100
   B    |   102
   C    |   220
   D    |   240
   E    |   242
   F    |   410

对于每一行,我想将较低的价格除以当前价格,如果结果大于0.9,我想增加行数。如果结果低于0.9,则此行的行号应为1,当前价格变为较低价格,然后迭代。

结果应如下所示:

product |  price | row_number
   A    |   100  |     1
   B    |   102  |     2
   C    |   220  |     1
   D    |   240  |     2
   E    |   242  |     3
   F    |   410  |     1

由于:

lower price = 100: product A get 1 as row_number
100/102 >= 0.9: product B get 2 as row_number
100/220 < 0.9: product C get 1 as row_number, lower price = 220
220/240 >= 0.9: product D get 2 as row_number
220/242 >= 0.9: product E get 3 as row_number
220/410 < 0.9: product F get 1 as row_number, lower price = 410

我正考虑创建一个按价格排序的temporary_row_number:

product |  price | temp_row_number
   A    |   100  |     1
   B    |   102  |     2
   C    |   220  |     3
   D    |   240  |     4
   E    |   242  |     5
   F    |   410  |     6

然后:

Select
   product,
   price,
   case
     when lag(price,temp_row_number-1,0)/price over() >= 0.9 then lag(price,temp_row_number-1,0)
     else price
   end as test
from my_table

这将检索:

product |  price | test
   A    |   100  | 100
   B    |   102  | 100
   C    |   220  | 220
   D    |   240  | 240
   E    |   242  | 242
   F    |   410  | 410

但理想情况下我想要检索

product |  price | test
   A    |   100  | 100
   B    |   102  | 100
   C    |   220  | 220
   D    |   240  | 220
   E    |   242  | 220
   F    |   410  | 410

因此,我可以使用row_number()函数按产品和价格计算row_number行,并获得预期结果。

1 个答案:

答案 0 :(得分:0)

WITH CTE AS
(选择产品,价格,(价格在100到200之间然后是1的情况)              当价格在200到300之间然后是2              当价格在300到400之间然后3 END)AS RN

         FROM #test) 

从CTE中选择产品,价格,ROW_NUMBER()(由RN订购的RN订单) 按产品订购