从Hive数组

时间:2016-10-08 11:19:40

标签: sql hadoop hive user-defined-functions

我在Hive中有一个表,其中包含3个表格,如下所示;

timestamp   UserID  OtherId    
2016-09-01  123     "101","222","321","987","393.1","090","467","863"
2016-09-01  124     "188","389","673","972","193","100","143","210"
2016-09-01  125     "888","120","482","594","393.2"
2016-09-01  126     "441","501","322","671","008","899"
2016-09-01  127     "004","700","393.4","761","467","356","643","578"
2016-09-01  128     "322","582","348"
2016-09-01  129     "029","393.8","126","187"

其中OtherID是数组。

我需要解析OtherID,以便得到的数据集如下,因为我只对包含'393%'的值感兴趣

timestamp   UserID  OtherId    
2016-09-01  123     393.1
2016-09-01  125     393.2
2016-09-01  127     393.4
2016-09-01  129     393.8

我已经研究了大量的解析函数,但似乎它们都是为了返回值的位置,或者你需要指定值的位置来返回它。这两个选项在这里都不起作用,因为对于任何给定的行,数组中的任何点都可以出现“3309%”。 还有一个事实是我需要加入通配符以允许我想要的值的变化。

另一种选择是爆炸,但我的桌子对于那个选项来说太大了。

我认为UDF可能是唯一可行的方法,但欢迎那里有一些指导。

感谢任何帮助。

2 个答案:

答案 0 :(得分:0)

使用hive中提供的横向视图选项,您可以轻松完成所需操作。

0: jdbc:hive2://quickstart:10000/default> select * from test_5; 
+-----------+------------+----------------------------------------------+
| test_5.t  | test_5.id  |                  test_5.oid                  |
+-----------+------------+----------------------------------------------+
| 123       | 123        | "222","321","987","393.1","090","467","863"  |
+-----------+------------+----------------------------------------------+

这就是诀窍:

SELECT id, ooid
FROM test_5 
LATERAL VIEW EXPLODE(SPLIT(oid,",")) temp AS ooid;

+------+----------+
|  id  |   ooid   |
+------+----------+
| 123  | "222"    |
| 123  | "321"    |
| 123  | "987"    |
| 123  | "393.1"  |
| 123  | "090"    |
| 123  | "467"    |
| 123  | "863"    |
+------+----------+

埃尔戈:

SELECT id, regexp_replace(ooid,'"','')
FROM test_5 
LATERAL VIEW EXPLODE(SPLIT(oid,",")) temp AS ooid;
WHERE ooid LIKE '"393%';

+------+----------+
|  id  |   ooid   |
+------+----------+
| 123  |  393.1   |
+------+----------+

答案 1 :(得分:0)

可以尝试如下:

hive> select timestamp1, userid, otherids from userdet1 LATERAL VIEW explode(otherid) testTable as otherids where otherids LIKE concat('393','%');

好的

2016-09-01  123 393.1

2016-09-01  125 393.2

2016-09-01  127 393.4

2016-09-01  129 393.8

Time taken:  0.297  seconds,  Fetched: 4 row(s)