Question

我有一个xml文件。

<?xml version="1.0" encoding="UTF-8"?> <doc>
  <!-- A comment -->
  <a xmlns="http://www.tei-c.org/ns/1.0">
    <w>word
    </w>
    <w>wording
    </w>
</a>
</doc>

我想返回包含＆＃34; word＆＃34;的节点但不是＆＃34;措辞＆＃34;。

library(XML) # I have nothing against using library(xml2) or library(xml2r) instead
test2 <- xmlParse("file.xml", encoding="UTF-8")
x <- c(x="http://www.tei-c.org/ns/1.0")

# starts-with seems to find the words just fine
test1 <- getNodeSet(doc, "//x:w[starts-with(., 'word')]", x)
# but R doesn't seem to allow "matches" to be included
# in the xpath query, hence none of the following work:
test1 <- getNodeSet(doc, "//x:w[[matches(., 'word')]]", x)
test1 <- getNodeSet(doc, "//x:w[@*[matches(., 'word')]]", x)
test1 <- getNodeSet(doc, "//x:w[matches(., '^word$')]", x)
test1 <- getNodeSet(doc, "//x:w[@*[matches(., '^word$')]]", x)

更新：如果我使用任何组合的匹配项，我会收到以下错误，结果为空列表。

xmlXPathCompOpEval: function matches not found
XPath error : Unregistered function
XPath error : Invalid expression
XPath error : Stack usage error
Error in xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces,  : 
  error evaluating xpath expression //x:w[matches(., '^word$')]

如果我根据以下建议查找"//x:w[@*[contains(., '^word$')]]"，我会收到以下警告和结果清单：

Warning message:
In xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces,  :
  the XPath query has no namespace, but the target document has a default namespace. 
 This is often an error and may explain why you obtained no results

我想我只是使用了错误的命令。我应该改变什么来使它工作？谢谢！

Answer 1

感谢您更新问题以包含错误消息。这就像去看医生并要求治疗来解决你的问题 - 你肯定想让他知道你已经注意到的具体症状了！

此错误消息确认缺少match()功能。这表明R（至少是您正在使用的版本）使用XPath 1.0，它没有match()或其他正则表达式功能。另一方面，BaseX支持XPath 2.0（实际上它支持XPath 3.0，IIRC），因此它可以处理match()。

关于如何在XPath 1.0中做你想做的事情，它并不完全清楚你想做什么。您提到使用单词边界标记，因此您可以尝试类似

的内容

getNodeSet(doc, "//x:w[contains(normalize-space(concat(' ', ., ' ')),
                                ' word ')]", x)

这将选择<w>个元素，其内容在文本的开头和/或结尾包含word，或者前面/后面是空格。如果要将某些非空白字符视为字边界，可以使用translate()将它们转换为空格。

R xpath getnodeset＆＃34;匹配＆＃34;命令

1 个答案: