Question

我正在尝试从www.speedtest.net/awards/ca/ontario中获取数据，当我沿着某些路径走下去时，标准功能似乎有效，但其他路径却没有。我不确定为什么。

例如，如果我进入标题并查找脚本，则可以使用

library(rvest)
URL<-read_html("http://www.speedtest.net/awards/ca/ontario")
test1<-html_nodes(URL,xpath='/html/head/script[1]')
test1

这将按预期返回{xml_nodeset（1）}。

但如果我进入身体并尝试类似的东西

test2<-html_nodes(URL,xpath='/html/body/script[1]')
test2

我得到{xml_nodeset（0）}。

为什么我无法访问正文下的节点？

我正在尝试使用下面的代码，但我已将我的问题追溯到上述问题。

real<-html_nodes(URL,xpath='/html/body/div[1]/div[3]/div/div[2]/div/div[3]/div[2]/table')
real

有什么想法吗？

Answer 1

感谢。使用css标签搜索我能够得到这个非常适合获得我想要的表（右下角的那个）。

library(rvest)
URL<-read_html("http://www.speedtest.net/awards/ca/ontario")
table<-html_nodes(URL, "table")
table<-html_table(table)[[2]]

Answer 2

试试这个，可能不完整，但它应该在回答你的问题时提供一个良好的开端：

library(rvest)
URL<-read_html("http://www.speedtest.net/awards/ca/ontario")
#find the table rows in the page
table<-html_nodes(URL, "tbody tr")

#pull info from the table rows
num<-html_text(html_nodes(table, "td.u-align-right"))
provider<-html_text(html_nodes(table, "td.cell-provider-name"))

#final data.frame with a table of the results
df<-data.frame(provider, matrix(num, ncol=3, byrow=TRUE))

使用rvest，我发现搜索css标签比使用xpath更容易。

给出{xml_nodeset（0）}的html_nodes

2 个答案: