Question

我无法在下面提到的链接中抓取该表，我检查了源代码，并注意到该表具有类名称：tablesaw-sortable

我在Wikipedia页面上测试了以下方法，并且能够提取表格，以任何方式读取特定表格？

url <- read_html("https://www.wunderground.com/history/airport/KNYC/2015/01/01/DailyHistory.html?HideSpecis=0")

weather_hourly <- url %>% 
  html_nodes(xpath='//*[@class="tablesaw-sortable"]') %>% 
  html_table()

Answer 1

好的，类似这样的操作应该可以使您非常接近想要的位置。

library("httr")
URL <- "https://www.timeanddate.com/weather/usa/new-york/historic?month=8&year=2018"
temp <- tempfile(fileext = ".html")
GET(url = URL, user_agent("Mozilla/5.0"), write_disk(temp))

library("XML")
df <- readHTMLTable(temp)
df <- df[[2]]

df

如果要遍历一堆URL并从每个URL导入数据，请创建一个小循环。

使用RVest在R中抓取表格

1 个答案: