Hive中的XML数据加载

时间:2016-12-29 20:19:12

标签: hive

我有一个像

这样的xml文件
<document>
<site>  
<url>htp://www.abc.com/</url>
<category>Sports</category>
<usercount>120</usercount>  
<review>good site</review>
</site> 
<site>
<url>http://www.fb.com/</url>
<category>Social</category>
<usercount>100</usercount>  
<review>Addictive</review>
</site> 
<site>  
<url>http://www.google.com/</url>
<category>Web Search</category>
<usercount>1000</usercount>  
<review>helpful</review>
</site> 
</document>

我正在通过以下脚本创建表。

    create table IF NOT EXISTS xmltest(url STRING,category STRING,usercount STRING,review STRING)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.url"="/document/site/url/text()",
"column.xpath.category"="/document/site/category/text()",
"column.xpath.usercount"="/document/site/usercount/text()",
"column.xpath.review"="/document/site/review/text()")
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<document",
"xmlinput.end"="</document>");

选择所有数据时,输出就像

'<string>htp://www.abc.com/http://www.fb.com/http://www.google.com/</string>    <string>SportsSocialWeb Search</string> <string>1201BillionSeveral Billions</string>    <string>good siteAddictiveVery helpful</string>'

我希望输出为 &#39; htp://www.abc.com/ Sports 120 good site&#39; &#39; http://www.fb.com/社交100上瘾&#39;

请帮忙。

0 个答案:

没有答案