在将XML数据文件加载到HIVE表时,我收到以下错误消息:
FAILED: SemanticException 7:9 Input format must implement InputFormat. Error encountered near token 'StoresXml'.
我加载XML文件的方式如下:
**创建一个表StoresXml
'CREATE EXTERNAL TABLE StoresXml (storexml string)
STORED AS INPUTFORMAT 'org.apache.mahout.classifier.bayes.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/hive/warehouse/stores';'
**位置/用户/配置单元/仓库/商店位于HDFS中。
load data inpath <local path where the xml file is stored> into table StoresXml;
现在,问题是当我从表StoresXml中选择任何列时,会出现上述错误。
请帮帮我。我哪里出错?
答案 0 :(得分:0)
1)首先你需要创建像
这样的单列表CREATE TABLE xmlsample(xml string);
2)之后你需要将本地/ hdfs中的数据加载到hive表中,如
LOAD DATA INPATH '---------' INTO TABLE XMLSAMPLE;
3)下一步使用XPATH
,XPATH_ARRAY
,XPATH_STRING
类似 XML 查询...
答案 1 :(得分:0)
我刚刚使用xpath将此transactions.xml文件加载到hive表中 对于XML文件: **将xml文件的记录放入一行:
terminal> cat /home/cloudera/Desktop/Test/Transactions_xml.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</record>|</record>\n|g' | grep -v '^\s*$' > /home/cloudera/Desktop/trx_xml;
terminal> hadoop fs -put /home/cloudera/Desktop/trx_xml.xml /user/cloudera/DataTest/Transactions_xml
hive>create table Transactions_xml1(xmldata string);
hive>load data inpath '/user/cloudera/DataTest/Transactions_xml' overwrite into table Transactions_xml1;
hive>create table Transactions_xml(trx_id int,account int,amount int);
hive>insert overwrite table Transactions_xml select xpath_int(xmldata,'record/Tid'),
xpath_int(xmldata,'record/AccounID'),
xpath_int(xmldata,'record/Amount') from Transactions_xml1;
我希望这会对你有所帮助。让我知道结果。
答案 2 :(得分:-1)
我开发了一个从csv文件生成配置单元脚本的工具。以下是有关如何生成文件的几个示例。 工具 - https://sourceforge.net/projects/csvtohive/?source=directory
使用Browse选择一个CSV文件并设置hadoop根目录ex:/ user / bigdataproject /
工具生成包含所有csv文件的Hadoop脚本,以下是一个示例 生成Hadoop脚本以将csv插入Hadoop
#!/bin/bash -v
hadoop fs -put ./AllstarFull.csv /user/bigdataproject/AllstarFull.csv
hive -f ./AllstarFull.hive
hadoop fs -put ./Appearances.csv /user/bigdataproject/Appearances.csv
hive -f ./Appearances.hive
hadoop fs -put ./AwardsManagers.csv /user/bigdataproject/AwardsManagers.csv
hive -f ./AwardsManagers.hive
生成的Hive脚本示例
CREATE DATABASE IF NOT EXISTS lahman;
USE lahman;
CREATE TABLE AllstarFull (playerID string,yearID string,gameNum string,gameID string,teamID string,lgID string,GP string,startingPos string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/bigdataproject/AllstarFull.csv' OVERWRITE INTO TABLE AllstarFull;
SELECT * FROM AllstarFull;
由于 维杰