我正在尝试读取一个简单的xml文件并从中提取数据.Below是文件
的src:
<a>
<b id="foo">b1</b>
<b id="bar">b2</b>
</a>
我在hive中创建了src表,如下所示:
Create table src(line string);
然后我按如下方式加载了这个表:
load data local inpath '/home/hduser/Desktop/batch/hiveip/src' into table src;
我正在尝试使用以下查询提取as数据:
select xpath(line,'//@id') from src;
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":"<a>"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":"<a>"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating array ('line',''//@id'')
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more
Caused by: java.lang.RuntimeException: Invalid expression '//@id'
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalNodeList(UDFXPathUtil.java:95)
at org.apache.hadoop.hive.ql.udf.xml.GenericUDFXPath.eval(GenericUDFXPath.java:76)
at org.apache.hadoop.hive.ql.udf.xml.GenericUDFXPath.evaluate(GenericUDFXPath.java:97)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79)
... 13 more
我没有得到输出。
但是,当我执行以下查询时,我得到了输出:
select xpath('<a><b id="foo">b1</b><b id="bar">b2</b></a>','//@id')
输出:
["foo","bar"]
如果有人能够解释我究竟发生了什么以及我在哪里做错了会很棒。
答案 0 :(得分:0)
您的表格src
很可能有4行。
+---------------------+--+
| src.line |
+---------------------+--+
| <a> |
| <b id="foo">b1</b> |
| <b id="bar">b2</b> |
| </a> |
+---------------------+--+
相反它应该是这样的
+----------------------------------------------+--+
| src.line |
+----------------------------------------------+--+
| <a><b id="foo">b1</b><b id="bar">b2</b></a> |
+----------------------------------------------+--+
以单行
的方式排列xml文件[cloudera@quickstart ~]$ cat myxml.xml
<a><b id="foo">b1</b><b id="bar">b2</b></a>
并将其加载到配置单元
create table src(line string)
location '/your/xml/location';
并运行您的查询。它应该给你预期的结果
+----------------+--+
| _c0 |
+----------------+--+
| ["foo","bar"] |
+----------------+--+