xpath查询(在Hive中)仅返回属性的第一个实例

时间:2013-06-25 22:35:07

标签: xml xpath hive

我有一个xpath查询(使用hive),我想显示所有book node id属性值。

我的Xpath语句如下所示:

Select xpath_string (bookxml, '/catalog/book/@id') from bookxml;

当我在hive中运行它时,它只返回第一个book id而不是所有值。你可以建议我可以退还所有图书ID吗?

4 个答案:

答案 0 :(得分:4)

我根本不知道Hive,但我对这个问题很感兴趣所以我用Google搜索了“Hive xpath_string”并且第一次点击的摘要是

  

给定XPath表达式,每个函数都返回一个特定的Hive类型:   xpath返回Hive字符串数组。 xpath_string返回一个字符串。   xpath_boolean返回...

所以我花了大约2秒才发现你想要使用xpath函数而不是xpath_string函数。

我有时想知道是否有人在转向文档之前转向StackOverflow ...

答案 1 :(得分:1)

我在单节点独立群集上本地运行。

我在Hive中输入以下行:

CREATE EXTERNAL TABLE books1 (books_xml string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
LOCATION '/user/David';

select xpath_string(books_xml, '/catalog/book/@id') from books1;

以下是我从日志中提取的错误消息:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"books_xml":"<?xml version=\"1.0\"?>"}
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
    at org.apache.hadoop.mapred.Child.main(Child.java:265)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"books_xml":"<?xml version=\"1.0\"?>"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
    ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.xml.UDFXPathString.evaluate(java.lang.String,java.lang.String)  on object org.apache.hadoop.hive.ql.udf.xml.UDFXPathString@510ebe18 of class org.apache.hadoop.hive.ql.udf.xml.UDFXPathString with arguments {<?xml version="1.0"?>:java.lang.String, /catalog/book/@id:java.lang.String} of size 2
    at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:880)
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181)
    at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:528)
    ... 9 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:856)
    ... 18 more
Caused by: java.lang.RuntimeException: Invalid expression '/catalog/book/@id'
    at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)
    at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalString(UDFXPathUtil.java:83)
    at org.apache.hadoop.hive.ql.udf.xml.UDFXPathString.evaluate(UDFXPathString.java:43)
    ... 23 more

答案 2 :(得分:0)

正如迈克尔所说,问题在于xpath_string函数。

我创建了像这样的小样本文件

<catalog><book id="1"></book><book id="2"></book></catalog>
<catalog><book id="3"></book><book id="5"></book></catalog>

我已经在这个文件周围创建了外部表:

CREATE EXTERNAL TABLE books (books_xml string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
LOCATION '/home/dino/Downloads/books';

当您使用xpath_string函数运行查询时,您将获得如下所示的第一个ID:

hive> select xpath_string(books_xml, '/catalog/book/@id') from books;

结果:

Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
1
3
Time taken: 3.765 seconds, Fetched: 2 row(s)

如果使用xpath函数运行相同的查询,则会得到以下结果:

hive> select xpath(books_xml, '/catalog/book/@id') from books;

结果:

Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
["1","2"]
["3","5"]
Time taken: 4.512 seconds, Fetched: 2 row(s)

答案 3 :(得分:0)

描述功能扩展使生活更轻松。这无疑使事情变得清晰:

描述扩展的path_string函数;

xpath_string(xml,xpath) - 返回与xpath表达式匹配的第一个xml节点的文本内容 例如:

SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c') FROM src LIMIT 1;
>'cc'
SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a/b') FROM src LIMIT 1;
>'b1'
SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a/b[2]') FROM src LIMIT 1;
>'b2'
SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a') FROM src LIMIT 1;
>'b1b2'

描述功能扩展路径;

xpath(xml,xpath) - 返回与xpath表达式匹配的xml节点中的字符串数组值 例如:

SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>', 'a/text()') FROM src LIMIT 1;
>[]
SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>', 'a/b/text()') FROM src LIMIT 1;
>["b1","b2","b3"]
SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>', 'a/c/text()') FROM src LIMIT 1;
>["c1","c2"]