我有一个xpath查询(使用hive),我想显示所有book node id属性值。
我的Xpath语句如下所示:
Select xpath_string (bookxml, '/catalog/book/@id') from bookxml;
当我在hive中运行它时,它只返回第一个book id而不是所有值。你可以建议我可以退还所有图书ID吗?
答案 0 :(得分:4)
我根本不知道Hive,但我对这个问题很感兴趣所以我用Google搜索了“Hive xpath_string”并且第一次点击的摘要是
给定XPath表达式,每个函数都返回一个特定的Hive类型: xpath返回Hive字符串数组。 xpath_string返回一个字符串。 xpath_boolean返回...
所以我花了大约2秒才发现你想要使用xpath函数而不是xpath_string函数。
我有时想知道是否有人在转向文档之前转向StackOverflow ...
答案 1 :(得分:1)
我在单节点独立群集上本地运行。
我在Hive中输入以下行:
CREATE EXTERNAL TABLE books1 (books_xml string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
LOCATION '/user/David';
select xpath_string(books_xml, '/catalog/book/@id') from books1;
以下是我从日志中提取的错误消息:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"books_xml":"<?xml version=\"1.0\"?>"}
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
at org.apache.hadoop.mapred.Child.main(Child.java:265)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"books_xml":"<?xml version=\"1.0\"?>"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.xml.UDFXPathString.evaluate(java.lang.String,java.lang.String) on object org.apache.hadoop.hive.ql.udf.xml.UDFXPathString@510ebe18 of class org.apache.hadoop.hive.ql.udf.xml.UDFXPathString with arguments {<?xml version="1.0"?>:java.lang.String, /catalog/book/@id:java.lang.String} of size 2
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:880)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:528)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:856)
... 18 more
Caused by: java.lang.RuntimeException: Invalid expression '/catalog/book/@id'
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalString(UDFXPathUtil.java:83)
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathString.evaluate(UDFXPathString.java:43)
... 23 more
答案 2 :(得分:0)
正如迈克尔所说,问题在于xpath_string函数。
我创建了像这样的小样本文件
<catalog><book id="1"></book><book id="2"></book></catalog>
<catalog><book id="3"></book><book id="5"></book></catalog>
我已经在这个文件周围创建了外部表:
CREATE EXTERNAL TABLE books (books_xml string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
LOCATION '/home/dino/Downloads/books';
当您使用xpath_string函数运行查询时,您将获得如下所示的第一个ID:
hive> select xpath_string(books_xml, '/catalog/book/@id') from books;
结果:
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
1
3
Time taken: 3.765 seconds, Fetched: 2 row(s)
如果使用xpath函数运行相同的查询,则会得到以下结果:
hive> select xpath(books_xml, '/catalog/book/@id') from books;
结果:
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
["1","2"]
["3","5"]
Time taken: 4.512 seconds, Fetched: 2 row(s)
答案 3 :(得分:0)
描述功能扩展使生活更轻松。这无疑使事情变得清晰:
描述扩展的path_string函数;
xpath_string(xml,xpath) - 返回与xpath表达式匹配的第一个xml节点的文本内容 例如:
SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c') FROM src LIMIT 1;
>'cc'
SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a/b') FROM src LIMIT 1;
>'b1'
SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a/b[2]') FROM src LIMIT 1;
>'b2'
SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a') FROM src LIMIT 1;
>'b1b2'
描述功能扩展路径;
xpath(xml,xpath) - 返回与xpath表达式匹配的xml节点中的字符串数组值 例如:
SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>', 'a/text()') FROM src LIMIT 1;
>[]
SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>', 'a/b/text()') FROM src LIMIT 1;
>["b1","b2","b3"]
SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>', 'a/c/text()') FROM src LIMIT 1;
>["c1","c2"]