我正在尝试将xml格式的数据加载到配置单元表中: -
我的XML文件看起来像这样 -
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
<id>11</id>
<genre>Computer</genre>
<price>44</price>
</book>
<book>
<id>44</id>
<genre>Fantasy</genre>
<price>5</price>
</book>
</catalog>
首先,我已将xml数据加载到托管表中,然后我使用xpath UDF函数来解析XML数据并在我的主表中加载实际值。以下是我正在尝试的hive查询: -
create table XmlSample(xmlData string);
load data inpath 'EmployeeDetails.xml' into table XmlSample;
create table xpath_table(id int,genre string,price string);
Insert overwrite table xpath_table select xpath_int(xmlData, '/catalog/book/id/text()'), xpath_string(xmlData, '/catalog/book/genre/text()'), xpath_string(xmlData, '/catalog/book/price/text()') from XmlSample;
但我得到例外 -
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public int org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger.evaluate(java.lang.String,java.lang.String) on object org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger@37fd3f of class org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger with arguments {<?xml version="1.0" encoding="UTF-8"?>:java.lang.String, /catalog/book/id/text():java.lang.String} of size 2
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1030)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1006)
... 20 more
Caused by: java.lang.RuntimeException: Invalid expression '/catalog/book/id/text()'
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalNumber(UDFXPathUtil.java:87)
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger.evaluate(UDFXPathInteger.java:35)
有人可以建议我如何避免这些例外。
答案 0 :(得分:0)
试试这个:
<强> 1。将每条记录带到一行(删除catalog
标记):
cat EmployeeDetails.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</book>|</book>\n|g' | sed 's/<catalog>//g' | grep -v '^\s*$' | sed '3d' > EmployeeDetails1.xml
<强> 2。创建目录并将转换后的xml文件复制到HDFS:
hadoop fs -mkdir /usr/xml/
hadoop fs -put EmployeeDetails1.xml /usr/xml/EmployeeDetails.xml
第3。创建表以在配置单元中加载xml:
create table XmlSample(xmldata string);
<强> 4。将HDFS中的xml文件加载到hive xml表中:
load data inpath '/usr/xml/EmployeeDetails.xml' into table XmlSample;
<强> 5。创建一个表以从hive中的xml表中提取数据:
create table xpath_table(id int,genre string,price string);
<强> 6。将提取的数据从xml表插入到配置单元中的表中:
insert overwrite table xpath_table select xpath_int(xmldata,'book/id'), xpath_string(xmldata,'book/genre'), xpath_string(xmldata,'book/price') from XmlSample;
注意:我刚刚添加了第1步并修改了第6步格式。这些 步骤对我有用。祝你好运:)