Question

我正在从'https://www.gov.mb.ca/sd/fire/Fire-Situation/daily-firesituation.html'抓取数据

library('rvest')
url_Manitoba <- 'https://www.gov.mb.ca/sd/fire/Fire-Situation/daily-firesituation.html'
webpage_Manitoba <- read_html(url_Manitoba)

population <- url %>%
xml2::read_html() %>%
html_nodes(xpath='//*        
[@id="Fire_Program_Template_Stuff"]/div/table/tbody/tr[7]/td') %>%
html_table()
population <- population[[1]]

人口错误[[1]]：下标超出范围

Answer 1

我不确定您的xpath语句是否正确。我发现使用CSS标记提取请求的信息会更容易。

如果要提取所有表节点，我相信您对第二个表感兴趣。

library('rvest')
url_Manitoba <- 'https://www.gov.mb.ca/sd/fire/Fire-Situation/daily-firesituation.html'
webpage_Manitoba <- read_html(url_Manitoba)

population <- webpage_Manitoba %>%
  html_nodes("table") %>%
  html_table(fill=TRUE)
population[[2]]

从这里开始，population [[2]]是一个包含主表内容的数据框。只需查询特定信息的正确行和/或列即可。

错误1：网络抓取时下标超出范围

1 个答案: