我正在尝试将数据从xml文件导入到hive。
我在文件中的数据采用以下格式:
<review>
<unique_id>0206cs23</unnique_id>
<product_name>Abcd</product_name>
<product_type>abcd122</product_type>
<rating>1</rating><title>ertn</title>
<date>23/03/2012</date>
<reviewer>mr. Abcd</reviewer>
<reviewer_location>North Carolina, USA</reviewer_location
<review_text>I've always held the</review_text>
</review>
这就是我正在做的事情::
CREATE EXTERNAL TABLE RREVIEW (unique_id BIGINT, product_name string, product_type string, rating float, title string, review_date string, reviewer string, reviewer_location string, review_text string)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.unique_id"="/review/unique_id/text()",
"column.xpath.product_name"="/review/product_name/text()",
"column.xpath.product_type"="/review/product_type/text()",
"column.xpath.rating"="/review/rating/text()",
"column.xpath.title"="/review/title/text()",
"column.xpath.review_date"="/review/date/text()",
"column.xpath.reviewer"="/review/reviewer/text()",
"column.xpath.reviewer_location"="/review/reviewer_location/text()",
"column.xpath.review_text"="/review/review_text/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<unique_id> ",
"xmlinput.end"="</review>"
);
加载数据后(使用:LOAD DATA INPATH 'hdfs_file_or_directory_path' [OVERWRITE] INTO TABLE tablename
)
它显示了这个消息:
Loading data to table default.rreview
Table default.rreview stats: [numFiles=1, numRows=0, totalSize=226671853, rawDataSize=0]
OK
Time taken: 0.373 seconds
但是当我查询表格时它没有显示任何内容......帮助。