Question

我在使用IGNORE NULLS参数的Vertica的FIRST_VALUE（）分析函数中看到了意外行为。它似乎不应该返回NULL。

问题发生在这个非常小的表格中：

drop table if exists temp;
create table temp (time_ timestamp(6), name varchar(10));
insert into temp (time_) values ('2016-03-18 20:32:16.144');
insert into temp (time_, name) values ('2016-03-18 20:52:09.062', 'abc');

以下是表的内容（select * from temp）：

time_                   | name
------------------------+--------
2016-03-18 20:32:16.144 | <null>
2016-03-18 20:52:09.062 | abc

以下是我正在运行的查询：

select time_,
  first_value(name ignore nulls) over (order by time_) first_name
from temp;

以下是此查询返回的结果：

time_                   | first_name
------------------------+------------
2016-03-18 20:32:16.144 | <null>
2016-03-18 20:52:09.062 | abc

以下是我期望（和期望）此查询的结果：

time_                   | first_name
------------------------+------------
2016-03-18 20:32:16.144 | abc
2016-03-18 20:52:09.062 | abc

上述查询是否存在非常基本的语法错误？ Vertica Community Edition 7.1.1上会出现此问题。

Answer 1

这是我的想法带给我的地方

select time_
      ,first_value(name) over (order by case when name is null then 1 else 0 end,time_) FirstName
from temp A
order by time_

返回

time_               FirstName
20:32:16.1440000    abc
20:52:09.0620000    abc

Answer 2

该功能按预期工作 over (order by time_)是over (order by time_ range unbounded preceding)的快捷方式，它是over (order by time_ range between unbounded preceding and current row)的快捷方式，这意味着每一行只能查看其前面的行，包括其自身。
第一行只能看到它自己，因此其范围内没有非NULL值。

如果您想要整个范围的第一个非NULL值，则必须指定整个范围：

first_value(name ignore nulls) over 
    (order by time_ range between unbounded preceding and unbounded following) first_name

不，这绝对不是一个错误。

您可能一直在使用sum(x) over (order by y)之类的语法来运行总计，而RANGE UNBOUNDED PRECEDING的默认窗口对您来说似乎很自然。
由于您尚未为FIRST_VALUE函数定义显式窗口，因此您一直使用相同的默认窗口。

这是另一个测试用例：

ts val
-- ----
1  NULL
2  X
3  NULL
4  Y
5  NULL

您希望从以下功能获得什么？

last_value (val) order (by ts)

您希望从以下功能获得什么？

last_value (val ignore nulls) order (by ts)

使用IGNORE NULLS（Vertica）的FIRST_VALUE（）中出现意外行为

2 个答案: