我已经提供了查询,与teradata兼容。它使用Sqoop在dimension_tab中导入了表。试图在HIVE上执行,但遗憾的是与hive兼容。
SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value,
GROUPING(fact_1_id) AS f1g,
GROUPING(fact_2_id) AS f2g
FROM dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;
然后我试着让它与HIVE兼容。参考:WIKI
幸运的是,CUBE可用于HIVE,但语法不同
即。 fact_1_id,fact_2_id WITH CUBE 。但表格文件,GROUPING()在HIVE中不可用。
如果HIVE中有GROUPING()功能,请帮助我。 要么 如何在HIVE上运行查询?
答案 0 :(得分:0)
如果您没有NULL
的{{1}}值,只需使用简单的逻辑:
id
这个逻辑(当然不是SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value,
(case when fact_1_id is null then 1 else 0 end) as f1g,
(case when fact_2_id is null then 1 else 0 end) as f2
FROM dimension_tab
GROUP BY fact_1_id, fact_2_id WITH CUBE
ORDER BY fact_1_id, fact_2_id;
)可以在Teradata和Hive中使用。
否则,如果您确实有NULL值,则可以使用WITH CUBE
:
GROUPING__ID
注意:SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value,
(case when (CAST (GROUPING__ID AS INT) & 1) = 0 then 1 else 0 end) as f1g,
(case when (CAST (GROUPING__ID AS INT) & 2) = 0 then 1 else 0 end) as f2g
FROM dimension_tab
GROUP BY fact_1_id, fact_2_id WITH CUBE
ORDER BY fact_1_id, fact_2_id;
取决于GROUPING__ID
中表达式的顺序,因此重新排列group by
可能会更改该标志的含义。