Hive SELECT语句创建ARRAY的STRUCTS

时间:2017-11-13 18:12:58

标签: arrays struct hive hiveql

我在Hive中选择ARRAY of STRUCTS时遇到了麻烦。

我的源表如下所示:

+-------------+--+
|    field    |
+-------------+--+
| id          |
| fieldid     |
| fieldlabel  |
| fieldtype   |
| answer_id   |
| unitname    |
+-------------+--+

这是调查数据,其中id是调查ID,中间的四个字段是响应数据,unitname是调查所涉及的业务单位。

我需要为每个调查ID的所有答案创建一个结构数组。我认为这会奏效,但事实并非如此:

select id, 
array( 
    named_struct(
        "field_id",
        fieldid,
        "field_label",
        fieldlabel,
        "field_type",
        fieldtype,
        "answer_id",
        answer_id,)) as answers,
unitname
from new_answers;

返回的是每个调查答案(field_id)作为该答案的一个结构的数组,如下所示:

id | answers | unitname
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069}] | Location2
2 | [{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070}] | Location2
2 | [{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2

但我需要做的是:

id | answers | unitname    
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},
   {"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},
   {"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2

我进行了搜索和搜索,但我发现的所有答案似乎都与使用INSERT INTO .... VALUES()查询有关。我已经有了桌子结构;我无法按照预期的方式将ARRAY送到ARRAY。

任何帮助都将非常感激。

出于复制目的,如果需要:

CREATE TABLE `new_answers`( 
`id` bigint,
`fieldid` bigint,
`fieldlabel` string,
`fieldtype` string,
`answer_id` bigint,
`unitname` string)

INSERT INTO new_answers VALUES
(1,175877,"Comment","COMMENT",8990947803,"Location1"),
(2,47824,"Language","MULTIPLE_CHOICE",8990950069,"Location2"),
(2,48187,"Language Type","MULTIPLE_CHOICE",8990950070,"Location2"),
(2,47829,"Trans #","TEXT",8990950071,"Location2");

1 个答案:

答案 0 :(得分:0)

您似乎正在寻找的功能是将结构体收集到数组中。 Hive带有两个用于收集数组的函数:collect_set和collect_list。但是,这些函数仅用于创建基本类型的数组。

brickhouse项目的jar(https://github.com/klout/brickhouse/wiki/Downloads)提供了许多功能,包括收集复杂类型的功能。

add jar hdfs://path/to/your/jars/brickhouse-0.6.0.jar

然后您可以使用您喜欢的任何名称添加collect函数:

create temporary function collect_struct as 'brickhouse.udf.collect.CollectUDAF';

以下查询:

select id
     , collect_struct( 
         named_struct(
           "field_id", fieldid,
           "field_label", fieldlabel,
           "field_type", fieldtype,
           "answer_id", answer_id)) as answers
     , unitname
  from new_answers
 group by id, unitname
;

提供以下结果:

id  answers unitname
1   [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] Location1
2   [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}]    Location2