试图抓取一些数据,但我不断收到错误消息。我的互联网工作正常,我也更新到最新的R版本 - 目前还没有办法解决这个问题。我尝试的任何网址都会发生这种情况。
library(RCurl)
library(XML)
url = "https://inciweb.nwcg.gov/"
content <- getURLContent(url)
Error in function (type, msg, asError = TRUE) :
Failed to connect to inciweb.nwcg.gov port 443: Timed out
答案 0 :(得分:1)
您可能需要在较慢的连接上设置显式超时:
library(httr)
library(rvest)
pg <- GET("https://inciweb.nwcg.gov/", timeout(60))
incidents <- html_table(content(pg))[[1]]
str(incidents)
## 'data.frame': 10 obs. of 7 variables:
## $ Incident: chr "Highline Fire" "Cottonwood Fire" "Rattlesnake Point Fire" "Coolwater Complex" ...
## $ Type : chr "Wildfire" "Wildfire" "Wildfire" "Wildfire" ...
## $ Unit : chr "Payette National Forest" "Elko District Office" "Nez Perce - Clearwater National Forests" "Nez Perce - Clearwater National Forests" ...
## $ State : chr "Idaho, USA" "Nevada, USA" "Idaho, USA" "Idaho, USA" ...
## $ Status : chr "Active" "Active" "Active" "Active" ...
## $ Acres : chr "83,630" "1,500" "4,843" "2,969" ...
## $ Updated : chr "1 min. ago" "1 min. ago" "3 min. ago" "5 min. ago" ...
临时解决方法
l <- charToRaw(paste0(readLines("https://inciweb.nwcg.gov/"), collapse="\n"))
pg <- read_html(l)
html_table(pg)[[1]]