Question

鉴于这个非常简单的数据集：

+--------+-----+
| Bucket | Foo |
+--------+-----+
|      1 | A   |
|      1 | B   |
|      1 | C   |
|      1 | D   |
+--------+-----+

我想在前一行看到Foo的值：

select
foo,
max(foo) over (partition by bucket order by foo rows between 1 preceding and 1 preceding) as prev_foo
from
...

这给了我：

+--------+-----+----------+
| Bucket | Foo | Prev_Foo |
+--------+-----+----------+
|      1 | A   | A        |
|      1 | B   | A        |
|      1 | C   | B        |
|      1 | D   | C        |
+--------+-----+----------+

为什么我会得到A＆＃39;回到第一排？我希望它是null。它在我寻找那个空的地方丢掉了计算。我可以通过在其中抛出row_number()来解决这个问题，但我更倾向于用更少的计算来处理它。

Answer 1

使用LAG函数获取上一行：

LAG(foo) OVER(partition by bucket order by foo) as Prev_Foo

在意外行为之前的hive行

1 个答案: