org.apache.hadoop.hive.ql.metadata.HiveException:处理行时出现Hive运行时错误{&#34; xmldata&#34;:&#34; <! - ?xml version = \“1.0 \”encoding = \ “UTF-8 \” - >&#34;}?

时间:2015-04-10 06:22:19

标签: xml hadoop xpath hive

我正在尝试将xml格式的数据加载到配置单元表中: -

我的XML文件看起来像这样 -

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

首先,我已将xml数据加载到托管表中,然后我使用xpath UDF函数来解析XML数据并在我的主表中加载实际值。以下是我正在尝试的hive查询: -

create table XmlSample(xmlData string);


load data inpath 'EmployeeDetails.xml' into table XmlSample;

create table xpath_table(id int,genre string,price string);

Insert overwrite table xpath_table select xpath_int(xmlData, '/catalog/book/id/text()'), xpath_string(xmlData, '/catalog/book/genre/text()'), xpath_string(xmlData, '/catalog/book/price/text()') from XmlSample;

但我得到例外 -

    java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
    ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public int org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger.evaluate(java.lang.String,java.lang.String)  on object org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger@37fd3f of class org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger with arguments {<?xml version="1.0" encoding="UTF-8"?>:java.lang.String, /catalog/book/id/text():java.lang.String} of size 2
    at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1030)
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181)
    at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
    at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
    at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
    ... 9 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1006)
    ... 20 more
Caused by: java.lang.RuntimeException: Invalid expression '/catalog/book/id/text()'
    at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)
    at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalNumber(UDFXPathUtil.java:87)
    at org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger.evaluate(UDFXPathInteger.java:35)

有人可以建议我如何避免这些例外。

1 个答案:

答案 0 :(得分:0)

试试这个:

<强> 1。将每条记录带到一行(删除catalog标记):

cat EmployeeDetails.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</book>|</book>\n|g' | sed 's/<catalog>//g' | grep -v '^\s*$' | sed '3d' > EmployeeDetails1.xml

<强> 2。创建目录并将转换后的xml文件复制到HDFS:

hadoop fs -mkdir /usr/xml/

hadoop fs -put EmployeeDetails1.xml /usr/xml/EmployeeDetails.xml

第3。创建表以在配置单元中加载xml:

create table XmlSample(xmldata string);

<强> 4。将HDFS中的xml文件加载到hive xml表中:

load data inpath '/usr/xml/EmployeeDetails.xml' into table XmlSample;

<强> 5。创建一个表以从hive中的xml表中提取数据:

create table xpath_table(id int,genre string,price string);

<强> 6。将提取的数据从xml表插入到配置单元中的表中:

insert overwrite table xpath_table select xpath_int(xmldata,'book/id'), xpath_string(xmldata,'book/genre'), xpath_string(xmldata,'book/price') from XmlSample;

  

注意:我刚刚添加了第1步并修改了第6步格式。这些   步骤对我有用。祝你好运:)