在Hive中执行复杂转换的功能

时间:2017-05-29 04:27:43

标签: hadoop hive hiveql

我试图在输入平面文件中进行一些转换。我面临的真正问题是我的输入文件由111个字段组成。那么我怎样才能对这些字段进行转换。

我有一个使用UDF的选项,但我怎么能把111个字段传递给我的UDF!这有可能吗,是否有任何方法可以将表格中的整个字段传递给我的UDF?

这是我的输入文件

A|Adding||Testing|DV005|         |7425478987|10              |     |Jayendran       |                                                  |Arumugam                                          |V|        |MALE|19711028|101         |N|01|               |Candy|               |1312 WEST 10TH STREET                             |                                                  |AUSTIN                                            |TX|                                                  |78703    |840                                               |               |5127768623|               |8009238-12345678912|A|B|H|01500|03000|Chocalates                                            |8009238||RAPID 7 LLC                   |20130501|00000000|               |000|              |               |   |        |        |   |        |        |               |                               |                                |N  |BUS|20150901|20160831|0000000000|0000000001|               |8009238-999940185-002348025-CAR|960230702-CAR-002348025-20150901|Y  |CAR|20160531|20160730|0000000011|0000001321|8009238-999940185-002348025-TRAIN|960230702-TRAIN-002348025-20150901|N  |TRAIN|20150901|20160831|0000000000|0000000000|                                 |                                |N  |VAN|20150901|20160831|               |0000000000|0000000000|                            |                        |               |N  |TRUCK|20150101|20991231|                                 |                                |N  |JEEP|        |        |0000000000|0000000000|                                 |                                |Y  |PLANE|20150901|20160831|               |20160319002530000001      

这是我的示例输出

Testing DV005 JayendranArumugam MALE
CAR2016053120160730
TRAIN0000000000000000
VAN0000000000000000
TRUCK0000000000000000
JEEP0000000000000000
PLANE2015090120160831

请帮助我找到我的解决方案

提前致谢

1 个答案:

答案 0 :(得分:0)

create external table mytable (rec string)
location '/... put the location here ...'
tblproperties ('serialization.last.column.takes.rest'='true')    
;
select  explode
        (
            array
            (
                concat_ws(' ',f[3],f[4],concat(f[9],f[11]),f[14])
               ,concat(f[ 67] ,case when f[ 66] = 'Y' then concat(f[ 68] ,f[ 69]) else '0000000000000000' end)
               ,concat(f[ 75] ,case when f[ 74] = 'Y' then concat(f[ 76] ,f[ 77]) else '0000000000000000' end)
               ,concat(f[ 83] ,case when f[ 82] = 'Y' then concat(f[ 84] ,f[ 85]) else '0000000000000000' end)
               ,concat(f[ 93] ,case when f[ 92] = 'Y' then concat(f[ 94] ,f[ 95]) else '0000000000000000' end)
               ,concat(f[ 99] ,case when f[ 98] = 'Y' then concat(f[100] ,f[101]) else '0000000000000000' end)
               ,concat(f[107] ,case when f[106] = 'Y' then concat(f[108] ,f[109]) else '0000000000000000' end)
            )
        )

from   (select  split(rec,'\\s*\\|\\s*') as f
        from    mytable
        ) t
;
+--------------------------------------+
|                 col                  |
+--------------------------------------+
| Testing DV005 JayendranArumugam MALE |
| CAR2016053120160730                  |
| TRAIN0000000000000000                |
| VAN0000000000000000                  |
| TRUCK0000000000000000                |
| JEEP0000000000000000                 |
| PLANE2015090120160831                |
+--------------------------------------+