将XML数据加载到hive表时出错

时间:2017-03-01 14:28:39

标签: hadoop hive

我正在尝试将XML文件加载到我的hive表中。下面是我的hive表查询。

<TAG>
 <NAME>ABCD</NAME><AGE>25</AGE><SEX>male</SEX>
 <NAME>EFGH</NAME><AGE>23</AGE><SEX>female</SEX>
</TAG>

我的输入文件格式如下:

ABCD,25,male
EFGH,23,female

我希望看到如下输出:

<string>ABCDEFGH</string>   NULL    <string>malefemale</string>

但我得到的输出如下:

$(document).ready(function() {
    $('#imdbInfoForm').on('submit', function(e) {
        e.preventDefault();
        var arr = $('#imdbUrl').val().match(/tt(\d+)/);
        var imdbId = arr[0];
        $.ajax({
                url: "http://www.omdbapi.com/?i=" + imdbId,
                success: function(data) {
                    $('#result').html('Title: ' + data.Title + '<br>' + 'Year: ' + data.Year + '<br>' + 'Rated: ' + data.Rated + '<br>' + 'Released: ' + data.Released + '<br>' + 'Runtime: ' + data.Runtime + '<br>' + 'Genre: ' + data.Genre + '<br>' + 'Director: ' + data.Director + '<br>' + 'Writer: ' + data.Writer + '<br>' + 'Actors: ' + data.Actors + '<br>' + 'Plot: ' + data.Plot + '<br>' + 'Language: ' + data.Language + '<br>' + 'Country: ' + data.Country + '<br>' + 'Awards: ' + data.Awards + '<br>' + 'Poster: ' + data.Poster + '<br>' + 'Metascore: ' + data.Metascore + '<br>' + 'imdbRating: ' + data.imdbRating + '<br>' + 'imdbVotes: ' + data.imdbVotes + '<br>' + 'imdbID: ' + data.imdbID + '<br>' + 'Type: ' + data.Type + '<br>' + 'Response: ' + data.Response + '<br>');

                }
            })
            .done(function(data) {
                if (console && console.log) {
                    console.log("Sample of data:", data.slice(0, 100));
                }
            });
    })
});

我正在使用jar文件:hivexmlserde-1.0.5.3.jar for Xml SerDe

有谁能告诉我我在这里犯的错误是什么? 任何帮助表示赞赏。

2 个答案:

答案 0 :(得分:1)

在任何地方使用text(),将age部分修改为:

   "column.xpath.AGE"="/TAG/AGE/text()"

您可以稍后在hive表中更改数据类型

从CREATE TABLE中删除位置部分:

LOCATION '/home/sid/hivexmltab'

而是使用LOAD命令在创建表格后加载所有数据

load data local inpath '/home/sid/hivexmltab/XMLfile.xml' overwrite into table MYDATA;

答案 1 :(得分:1)

这是一个糟糕的XML结构......
<NAME>...</NAME><AGE>...</AGE><SEX>...</SEX>的任何组合都应该由其他标记包装。

CREATE EXTERNAL TABLE MYDATA
(
    NAME    array<string>
   ,AGE     array<int>
   ,SEX     array<string>    
)
    ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
    WITH SERDEPROPERTIES
    (
        "column.xpath.NAME" = "TAG/NAME/text()"
       ,"column.xpath.AGE"  = "TAG/AGE/text()"
       ,"column.xpath.SEX"  = "TAG/SEX/text()"
    )
    STORED AS 
    INPUTFORMAT     'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
    OUTPUTFORMAT    'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
    LOCATION        '/home/sid/hivexmltab'
    TBLPROPERTIES
    (
        "xmlinput.start" = "<TAG"
       ,"xmlinput.end"   = "</TAG>"
    )
;
select * from MYDATA
;
+-----------------+------------+-------------------+
|     a.name      | mydata.age |    mydata.sex     |
+-----------------+------------+-------------------+
| ["ABCD","EFGH"] | [25,23]    | ["male","female"] |
+-----------------+------------+-------------------+
select  NAME[pe.n]  as name
       ,AGE [pe.n]  as age
       ,SEX [pe.n]  as sex

from    MYDATA m
        lateral view posexplode (m.NAME) pe as n,x
;
+------+-----+--------+
| name | age |  sex   |
+------+-----+--------+
| ABCD |  25 | male   |
| EFGH |  23 | female |
+------+-----+--------+