我有一个HIVE表,其代码如下:
hive>desc books;
gen_id int
author array<string>
rating double
genres array<string>
hive>select * from books;
| gen_id | rating | author |genres
+----------------+-------------+---------------+----------
| 1 | 10 | ["A","B"] | ["X","Y"]
| 2 | 20 | ["C","A"] | ["Z","X"]
| 3 | 30 | ["D"] | ["X"]
是否存在可以执行某些SELECT操作并返回单个行的查询,如下所示:
| gen_id | rating | JoinData
+-------------+---------------+-------------
| 1 | 10 | ["A","B","X","Y"]
| 2 | 20 | ["C","A","Z","X"]
| 3 | 30 | ["D","X"]
| 1 | 10 | "Y"
有人可以指导我如何获得这个结果。提前感谢您的任何帮助。
答案 0 :(得分:2)
答案在这篇文章中:
[1]:http://stackoverflow.com/questions/21578477/array-intersect-hive
对于那些不想进入主题的人来说:
1)使用UDF创建临时函数 CREATE TEMPORARY FUNCTION结合AS&#39; brickhouse.udf.collect.CombineUDF&#39 ;;
2)制作一个选择陈述
select gen_id
, rating
, combine(author, genres) as JoinData
from books