在蜂巢中GROUPING()?

时间:2015-04-11 12:29:42

标签: sql hadoop hive aggregate-functions teradata

我已经提供了查询,与teradata兼容。它使用Sqoop在dimension_tab中导入了表。试图在HIVE上执行,但遗憾的是与hive兼容。

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING(fact_1_id) AS f1g, 
       GROUPING(fact_2_id) AS f2g
FROM   dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

然后我试着让它与HIVE兼容。参考:WIKI

幸运的是,CUBE可用于HIVE,但语法不同

即。 fact_1_id,fact_2_id WITH CUBE 。但表格文件,GROUPING()在HIVE中不可用。

如果HIVE中有GROUPING()功能,请帮助我。 要么 如何在HIVE上运行查询?

1 个答案:

答案 0 :(得分:0)

如果您没有NULL的{​​{1}}值,只需使用简单的逻辑:

id

这个逻辑(当然不是SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value, (case when fact_1_id is null then 1 else 0 end) as f1g, (case when fact_2_id is null then 1 else 0 end) as f2 FROM dimension_tab GROUP BY fact_1_id, fact_2_id WITH CUBE ORDER BY fact_1_id, fact_2_id; )可以在Teradata和Hive中使用。

否则,如果您确实有NULL值,则可以使用WITH CUBE

GROUPING__ID

注意:SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value, (case when (CAST (GROUPING__ID AS INT) & 1) = 0 then 1 else 0 end) as f1g, (case when (CAST (GROUPING__ID AS INT) & 2) = 0 then 1 else 0 end) as f2g FROM dimension_tab GROUP BY fact_1_id, fact_2_id WITH CUBE ORDER BY fact_1_id, fact_2_id; 取决于GROUPING__ID中表达式的顺序,因此重新排列group by可能会更改该标志的含义。