我正在尝试将XML文件加载到我的hive表中。下面是我的hive表查询。
<TAG>
<NAME>ABCD</NAME><AGE>25</AGE><SEX>male</SEX>
<NAME>EFGH</NAME><AGE>23</AGE><SEX>female</SEX>
</TAG>
我的输入文件格式如下:
ABCD,25,male
EFGH,23,female
我希望看到如下输出:
<string>ABCDEFGH</string> NULL <string>malefemale</string>
但我得到的输出如下:
$(document).ready(function() {
$('#imdbInfoForm').on('submit', function(e) {
e.preventDefault();
var arr = $('#imdbUrl').val().match(/tt(\d+)/);
var imdbId = arr[0];
$.ajax({
url: "http://www.omdbapi.com/?i=" + imdbId,
success: function(data) {
$('#result').html('Title: ' + data.Title + '<br>' + 'Year: ' + data.Year + '<br>' + 'Rated: ' + data.Rated + '<br>' + 'Released: ' + data.Released + '<br>' + 'Runtime: ' + data.Runtime + '<br>' + 'Genre: ' + data.Genre + '<br>' + 'Director: ' + data.Director + '<br>' + 'Writer: ' + data.Writer + '<br>' + 'Actors: ' + data.Actors + '<br>' + 'Plot: ' + data.Plot + '<br>' + 'Language: ' + data.Language + '<br>' + 'Country: ' + data.Country + '<br>' + 'Awards: ' + data.Awards + '<br>' + 'Poster: ' + data.Poster + '<br>' + 'Metascore: ' + data.Metascore + '<br>' + 'imdbRating: ' + data.imdbRating + '<br>' + 'imdbVotes: ' + data.imdbVotes + '<br>' + 'imdbID: ' + data.imdbID + '<br>' + 'Type: ' + data.Type + '<br>' + 'Response: ' + data.Response + '<br>');
}
})
.done(function(data) {
if (console && console.log) {
console.log("Sample of data:", data.slice(0, 100));
}
});
})
});
我正在使用jar文件:hivexmlserde-1.0.5.3.jar for Xml SerDe
有谁能告诉我我在这里犯的错误是什么? 任何帮助表示赞赏。
答案 0 :(得分:1)
在任何地方使用text(),将age部分修改为:
"column.xpath.AGE"="/TAG/AGE/text()"
您可以稍后在hive表中更改数据类型
从CREATE TABLE中删除位置部分:
LOCATION '/home/sid/hivexmltab'
而是使用LOAD命令在创建表格后加载所有数据
load data local inpath '/home/sid/hivexmltab/XMLfile.xml' overwrite into table MYDATA;
答案 1 :(得分:1)
这是一个糟糕的XML结构......
<NAME>...</NAME><AGE>...</AGE><SEX>...</SEX>
的任何组合都应该由其他标记包装。
CREATE EXTERNAL TABLE MYDATA
(
NAME array<string>
,AGE array<int>
,SEX array<string>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES
(
"column.xpath.NAME" = "TAG/NAME/text()"
,"column.xpath.AGE" = "TAG/AGE/text()"
,"column.xpath.SEX" = "TAG/SEX/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/home/sid/hivexmltab'
TBLPROPERTIES
(
"xmlinput.start" = "<TAG"
,"xmlinput.end" = "</TAG>"
)
;
select * from MYDATA
;
+-----------------+------------+-------------------+
| a.name | mydata.age | mydata.sex |
+-----------------+------------+-------------------+
| ["ABCD","EFGH"] | [25,23] | ["male","female"] |
+-----------------+------------+-------------------+
select NAME[pe.n] as name
,AGE [pe.n] as age
,SEX [pe.n] as sex
from MYDATA m
lateral view posexplode (m.NAME) pe as n,x
;
+------+-----+--------+
| name | age | sex |
+------+-----+--------+
| ABCD | 25 | male |
| EFGH | 23 | female |
+------+-----+--------+