窗口函数错误至少1组必须仅取决于输入列

时间:2019-03-12 06:14:22

标签: sql hiveql

我有一个带有窗口函数的常用表表达式,并不断收到错误消息:

  

编译语句时出错:失败:SemanticException无法执行   将窗口调用分组。至少必须有一组   取决于输入列。还要检查循环依赖性。   底层错误:org.apache.hadoop.hive.ql.parse.SemanticException:   行82:6 CTE的定义中的列引用'gcr_amt'无效   pro_orders [选择o.shopper_id作为pro_shopper_id,   date_format(o.order_date,'YYYYMM')as ym_order,sum(o.gcr_amt)as   total_gcr,sum(当o.product_pnl_new_renewal_name ='New时的情况   购买”,然后将o.gcr_amt结束)作为new_gcr,总和(o.gcr_amt)   (由o.shopper_id行划分为12个在前和1个在后)   作为来自dp_enterprise.uds_order或内部联接的12months_direct_gcr   cs.pro_shopper_id = o.shopper_id上的Combined_shopper_level_data cs   cs.year_month = date_format(o.order_date,'YYYYMM')其中   o.exclude_reason_desc是由o.shopper_id,o.order_date组成的Null组   在第83:5行用作po

我的CTE看起来像这样:

pro_orders as (
  select  o.shopper_id as pro_shopper_id,
          date_format(o.order_date, 'YYYYMM') as ym_order,
          sum(o.gcr_amt) as total_gcr,
          sum(case when o.product_pnl_new_renewal_name = 'New Purchase' then o.gcr_amt end) as new_gcr,
          sum(o.gcr_amt) over (partition by o.shopper_id, cs.year_month order by cs.year_month desc rows between 12 preceding and 0 following) as 12months_direct_gcr
  from dp_enterprise.uds_order o
  right join combined_shopper_level_data cs on cs.pro_shopper_id = o.shopper_id and cs.year_month = date_format(o.order_date, 'YYYYMM')
  group by o.shopper_id, o.order_date
),

我不经常使用窗口函数,也许我的语法不可用。用英语我想做的是获取指标“ gcr”的12个月总计。

所以在year_month 201901中有shopper_id 123abc的行,我想将前11个月加上gcr的当前行月份加起来,总计12个月。不确定我的窗口功能是否正确设置了?

所引用的year_month的格式为YYYYMM,例如201901。

我的目标窗口功能设置正确吗?

如何克服此错误消息?

编辑: 仍然收到带有以下CTE的错误消息:

pro_orders as (
  select  o.shopper_id as pro_shopper_id,
          cs.year_month,
          sum(case when date_format(o.order_date, 'YYYYMM') = cs.year_month then o.gcr_amt else 0 end) as total_gcr,
          sum(case when date_format(o.order_date, 'YYYYMM') = cs.year_month and o.product_pnl_new_renewal_name = 'New Purchase' then o.gcr_amt else 0 end) as new_gcr,
          sum(sum(o.gcr_amt)) over  (partition by o.shopper_id 
                                order by cs.year_month desc 
                                rows between 12 preceding and 0 following) 
                                as 12months_direct_gcr
  from combined_shopper_level_data cs
  left join dp_enterprise.uds_order o on o.shopper_id = cs.pro_shopper_id
  where o.exclude_reason_desc is Null
  group by o.shopper_id, cs.year_month
),

导致类似的错误消息:

  

编译语句时出错:失败:SemanticException无法执行   将窗口调用分组。至少必须有一组   取决于输入列。还要检查循环依赖性。   底层错误:org.apache.hadoop.hive.ql.parse.SemanticException:   行83:10 CTE的定义中的列引用'gcr_amt'无效   pro_orders [选择o.shopper_id作为pro_shopper_id,cs.year_month,   总和(如果date_format(o.order_date,'YYYYMM')= cs.year_month然后   o.gcr_amt else 0 end)as total_gcr,sum(case when)   date_format(o.order_date,'YYYYMM')= cs.year_month和   o.product_pnl_new_renewal_name ='新购买',然后o.gcr_amt否则为0   end)作为new_gcr,sum(sum(o.gcr_amt))over(由o.shopper_id分区)   按cs.year_month desc行排序,介于12个在前和0个在后)   作为来自Combined_shopper_level_data CS的12months_direct_gcr左连接   dp_enterprise.uds_order o on o.shopper_id = cs.pro_shopper_id其中   o.exclude_reason_desc是由o.shopper_id,cs.year_month组成的Null组]   在第87:5行用作po

1 个答案:

答案 0 :(得分:1)

您有一个聚合查询,因此window函数看起来有点有趣。基本思想是这样的:

sum(sum(o.gcr_amt)) over (partition by o.shopper_id, cs.year_month
                          order by cs.year_month desc
                          rows between 12 preceding and 0 following
                         ) as 12months_direct_gcr

这仍然行不通。首先,您在order bypartition by中具有值。其次,它不在group by中。

假设每个月都有一个值,那么您可以使用:

sum(sum(o.gcr_amt)) over (partition by o.shopper_id
                          order by cs.year_month desc
                          rows between 12 preceding and 0 following
                         ) as 12months_direct_gcr

并在cs.year_month中使用group by(这可能需要调整查询的其他部分。

出于可读性考虑,我还建议您使用left join而不是right join。对于我(和大多数人)来说,说“保留我刚刚读取的第一个表中的所有行”比“保留要在{末尾读取的某张表中的所有行”在认知上要简单得多。 {1}}子句”。

编辑:

我认为完整的查询是:

from

Hive在聚合查询中使用窗口函数可能会受到限制(这会让我感到惊讶,因为它们是单独处理的)。我找不到对此的具体参考。如果是这样,只需使用子查询:

with pro_orders as (
      select o.shopper_id as pro_shopper_id,
             cs.year_month,
             sum(coalesce(o.gcr_amt, 0)) as total_gcr,
             sum(case when o.product_pnl_new_renewal_name = 'New Purchase' then o.gcr_amt else 0 end) as new_gcr,
             sum(sum(o.gcr_amt)) over (partition by o.shopper_id 
                                       order by cs.year_month desc 
                                       rows between 12 preceding and 0 following
                                      ) as 12months_direct_gcr
      from combined_shopper_level_data cs left join
           dp_enterprise.uds_order o
           on o.shopper_id = cs.pro_shopper_id and
              date_format(o.order_date, 'YYYYMM') = cs.year_month and
              o.exclude_reason_desc is Null
      group by o.shopper_id, cs.year_month
     ),