我有一个像
这样的xml文件<document>
<site>
<url>htp://www.abc.com/</url>
<category>Sports</category>
<usercount>120</usercount>
<review>good site</review>
</site>
<site>
<url>http://www.fb.com/</url>
<category>Social</category>
<usercount>100</usercount>
<review>Addictive</review>
</site>
<site>
<url>http://www.google.com/</url>
<category>Web Search</category>
<usercount>1000</usercount>
<review>helpful</review>
</site>
</document>
我正在通过以下脚本创建表。
create table IF NOT EXISTS xmltest(url STRING,category STRING,usercount STRING,review STRING)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.url"="/document/site/url/text()",
"column.xpath.category"="/document/site/category/text()",
"column.xpath.usercount"="/document/site/usercount/text()",
"column.xpath.review"="/document/site/review/text()")
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<document",
"xmlinput.end"="</document>");
选择所有数据时,输出就像
'<string>htp://www.abc.com/http://www.fb.com/http://www.google.com/</string> <string>SportsSocialWeb Search</string> <string>1201BillionSeveral Billions</string> <string>good siteAddictiveVery helpful</string>'
我希望输出为 &#39; htp://www.abc.com/ Sports 120 good site&#39; &#39; http://www.fb.com/社交100上瘾&#39;
请帮忙。