Question

使用rvest，如何选择没有属性的节点？

例如：

<nodes>
    <node attribute1="aaaa"></node>
    <node attribute1="bbbb"></node>
    <node></node> <- FIND THIS
</nodes>

这是使用XPath的相关thread，但是当我尝试使用类似

的方法rvest时

wp %>% html_read(.) %>% html_nodes(xpath = "//node[not(@*)")

其中wp是所需的URL，我出错了：

Warning message:
In xpath_search(x$node, x$doc, xpath = xpath, nsMap = ns, num_results = Inf) :
  Invalid predicate [1206]

当我可以看到要刮取的内容时，页面源中没有属性。

坦率地说，我只是对Web开发和HTML不够了解，无法理解如何将此示例归纳为rvest的重复。任何帮助或资源将不胜感激！

编辑：

在rvest中实现此目标的正确代码是

wp %>% html_read(.) %>% html_nodes(xpath = "//node[not(@*)]")

Answer 1

您似乎只是缺少一个右方括号：

library(rvest)

"<nodes>
    <node attribute1=\"aaaa\" attribute2=\"cccc\"></node>
    <node attribute1=\"bbbb\"></node>
    <node></node>
</nodes>" %>% 
  read_html() %>% 
  html_nodes(xpath = "//node[not(@*)]")

给予

{xml_nodeset (1)}
[1] <node></node>

如何使用rvest选择所有没有属性的节点？

编辑：

1 个答案: