XML文件的“处理行时的Hive运行时错误”

时间:2018-03-06 17:16:55

标签: xml hadoop hive

我正在尝试读取一个简单的xml文件并从中提取数据.Below是文件

的src:

<a>
        <b id="foo">b1</b>
        <b id="bar">b2</b>
</a>

我在hive中创建了src表,如下所示:

Create table src(line string);

然后我按如下方式加载了这个表:

load data local inpath '/home/hduser/Desktop/batch/hiveip/src' into table src;

我正在尝试使用以下查询提取as数据:

select xpath(line,'//@id') from src;

    Diagnostic Messages for this Task:
    Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":"<a>"}
            at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
            at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
            at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
            at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
            at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":"<a>"}
            at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
            at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
            ... 8 more
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating array ('line',''//@id'')
            at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
            at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
            at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
            at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
            at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
            ... 9 more
    Caused by: java.lang.RuntimeException: Invalid expression '//@id'
            at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)
            at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalNodeList(UDFXPathUtil.java:95)
            at org.apache.hadoop.hive.ql.udf.xml.GenericUDFXPath.eval(GenericUDFXPath.java:76)
            at org.apache.hadoop.hive.ql.udf.xml.GenericUDFXPath.evaluate(GenericUDFXPath.java:97)
            at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
            at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
            at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
            at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79)
            ... 13 more

我没有得到输出。

但是,当我执行以下查询时,我得到了输出:

select xpath('<a><b id="foo">b1</b><b id="bar">b2</b></a>','//@id')

输出:

["foo","bar"]

如果有人能够解释我究竟发生了什么以及我在哪里做错了会很棒。

1 个答案:

答案 0 :(得分:0)

您的表格src很可能有4行。

+---------------------+--+
|      src.line       |
+---------------------+--+
| <a>                 |
| <b id="foo">b1</b>  |
| <b id="bar">b2</b>  |
| </a>                |
+---------------------+--+

相反它应该是这样的

+----------------------------------------------+--+
|                   src.line                   |
+----------------------------------------------+--+
| <a><b id="foo">b1</b><b id="bar">b2</b></a>  |
+----------------------------------------------+--+

以单行

的方式排列xml文件
[cloudera@quickstart ~]$ cat myxml.xml 
<a><b id="foo">b1</b><b id="bar">b2</b></a>

并将其加载到配置单元

create table src(line string)
location '/your/xml/location';

并运行您的查询。它应该给你预期的结果

+----------------+--+
|      _c0       |
+----------------+--+
| ["foo","bar"]  |
+----------------+--+