Hive:加入字符串列数组

时间:2015-03-25 12:09:42

标签: arrays hive

我有一个HIVE表,其代码如下:

hive>desc books;
gen_id                  int                                         
author                  array<string>                               
rating                  double                               
genres                  array<string>  

hive>select * from books;

| gen_id         | rating    | author          |genres
+----------------+-------------+---------------+----------
| 1              | 10        | ["A","B"]       | ["X","Y"]  
| 2              | 20        | ["C","A"]       | ["Z","X"]
| 3              | 30        | ["D"]           | ["X"]

是否存在可以执行某些SELECT操作并返回单个行的查询,如下所示:

| gen_id      |  rating        | JoinData
+-------------+---------------+-------------
| 1           | 10            | ["A","B","X","Y"]
| 2           | 20            | ["C","A","Z","X"]
| 3           | 30            | ["D","X"]
| 1           | 10            | "Y"

有人可以指导我如何获得这个结果。提前感谢您的任何帮助。

1 个答案:

答案 0 :(得分:2)

答案在这篇文章中:   
[1]:http://stackoverflow.com/questions/21578477/array-intersect-hive

对于那些不想进入主题的人来说:

1)使用UDF创建临时函数 CREATE TEMPORARY FUNCTION结合AS&#39; brickhouse.udf.collect.CombineUDF&#39 ;;

2)制作一个选择陈述

select gen_id
    , rating
    , combine(author, genres) as JoinData 
from books