我正在使用两个表格(或多或少)的HIVE:
-TABLE1定义为[(变量:字符串),(Value1:int),(Value2:int)]
字段“变量”看起来像“x0,x1,x2,x3,...,xn”
-TABLE2定义为[(Value1Sum:int),(Value2Sum:int),(X1:string),(X4:string),(X17:string)]
我使用查询“将”table1“转换”为table2:
INSERT OVERWRITE TABLE table2
SELECT sum(v1), sum(v2), x1, x4, x17
FROM (SELECT
Value1 as v1,
Value2 as v2,
split(Variables, ",")[1] as x1,
split(Variables, ",")[4] as x4,
split(Variables, ",")[17] as x17
FROM Table1) tmp
GROUP BY tmp.x1, tmp.x4, tmp.x17
Hive是否会将拆分功能调用3次?
有没有办法让它更优雅?
有没有办法让它更通用?
祝你好运, CC
答案 0 :(得分:3)
是的,每次都会调用split。你可以使它更优雅:
为什么不将Variables定义为一个数组列?他们可以直接访问元素:
select Varaibles[1] from table1
我假设你正在使用外部表,所以你可以这样做:
create external table table1(variables array<string>, a int, b int)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY ','
LOCATION 'hdfs://somewhere'