Question

我正在尝试计算2行之间的时差并应用this SO问题的解决方案。但是我得到了一个例外：

> org.apache.hive.service.cli.HiveSQLException: Error while compiling
> statement: FAILED: SemanticException Failed to breakup Windowing
> invocations into Groups. At least 1 group must only depend on input
> columns. Also check for circular dependencies. Underlying error:
> Expecting left window frame boundary for function
> LAG((tok_table_or_col time), 1, 0) Window
> Spec=[PartitioningSpec=[partitionColumns=[(tok_table_or_col
> client_id)]orderColumns=[(tok_table_or_col time) ASC
> NULLS_FIRST]]window(type=ROWS, start=1 PRECEDING, end=currentRow)] as
> LAG_window_0 to be unbounded. Found : 1

HiveQL：

SELECT id, loc, LAG(time, 1, 0) OVER (PARTITION BY id, loc ORDER BY time ROWS 1 PRECEDING) - time AS response_time FROM mytable

如何修复此问题？有什么问题？

编辑：

示例数据：

id  loc time
0   1   1414250523591
0   1   1414250523655
1   2   1414250523655
1   2   1414250523661
1   3   1414250523661
1   3   1414250523662

我想要的是具有相同id和loc的行之间的时间差异（总是2对）。

编辑2：我还应该提到我是hadoop / hive生态系统的新手。

因为错误说，窗口应该是无界的。所以我刚刚删除了ROWS子句，现在至少它正在做一些事情，但它仍然是错误的。所以我只想检查LAG值实际是什么：

SELECT id, loc, LAG(time, 1) OVER (PARTITION BY id, loc ORDER BY time) AS lag_col FROM mytable

我得到这个作为输出：

id  loc lag_col
1   2   null
1   2   -1
1   3   null
1   3   -1

null是清楚的，因为我删除了默认值，但为什么-1？时间列中的大值是否会导致某些溢出？列被定义为bigint，所以它实际上应该没有问题但是在查询期间可能会转换为int吗？

Hive：使用带窗口函数的LAG时出现异常

0 个答案: