Hive Windowing:分区上的不同结果

时间:2018-10-19 15:19:16

标签: hive windowing

你好,我在学习Hive的WINDOWING功能,遇到了一个问题。

我试图查找一个月内的客户数量:

my_table:

  • date_in_out:获取日期
  • rate_plan_name:字符串
  • 库存:int
  • 传入者:int

我对3个变量进行了划分:获取的年/月和rate_plan


SELECT (first_value(stock) OVER w + sum(incomers) OVER w) AS stock_monthly,
year(date_in_out) AS year_in,
month(date_in_out) AS month_in,
rate_plan_name
FROM my_table
WINDOW w AS (PARTITION BY rate_plan_name, year(date_in_out), month(date_in_out) ORDER BY date_in_out ASC);

我得到了结果

enter image description here

我获得了不同的month_stock值,而我的数据集中的year_in / month_in和rate_plan_name相同。

我的问题是,为什么这个值与众不同?我希望这里也一样。

1 个答案:

答案 0 :(得分:1)

使用order by date_in_out规范中的window,将为每一行计算sum。如果您需要将其汇总为一年月份的水平,请使用

WINDOW w AS (PARTITION BY rate_plan_name, year(date_in_out), month(date_in_out))

但是请注意,first_value仍需要一个order by

我认为您正在寻找

SELECT first_value(stock) OVER(w ORDER BY date_in_out) + sum(incomers) OVER w AS stock_monthly,
year(date_in_out) AS year_in,
month(date_in_out) AS month_in,
rate_plan_name
FROM my_table
WINDOW w AS (PARTITION BY rate_plan_name, year(date_in_out), month(date_in_out))