错误:端口443超时 - 刮擦数据

时间:2017-09-13 16:29:12

标签: r xml

试图抓取一些数据,但我不断收到错误消息。我的互联网工作正常,我也更新到最新的R版本 - 目前还没有办法解决这个问题。我尝试的任何网址都会发生这种情况。

library(RCurl)
library(XML)

url = "https://inciweb.nwcg.gov/"
content <- getURLContent(url)
     Error in function (type, msg, asError = TRUE)  : 
       Failed to connect to inciweb.nwcg.gov port 443: Timed out

1 个答案:

答案 0 :(得分:1)

您可能需要在较慢的连接上设置显式超时:

library(httr)
library(rvest)

pg <- GET("https://inciweb.nwcg.gov/", timeout(60))

incidents <- html_table(content(pg))[[1]]

str(incidents)
## 'data.frame': 10 obs. of  7 variables:
##  $ Incident: chr  "Highline Fire" "Cottonwood Fire" "Rattlesnake Point Fire" "Coolwater Complex" ...
##  $ Type    : chr  "Wildfire" "Wildfire" "Wildfire" "Wildfire" ...
##  $ Unit    : chr  "Payette National Forest" "Elko District Office" "Nez Perce - Clearwater National Forests" "Nez Perce - Clearwater National Forests" ...
##  $ State   : chr  "Idaho, USA" "Nevada, USA" "Idaho, USA" "Idaho, USA" ...
##  $ Status  : chr  "Active" "Active" "Active" "Active" ...
##  $ Acres   : chr  "83,630" "1,500" "4,843" "2,969" ...
##  $ Updated : chr  "1 min. ago" "1 min. ago" "3 min. ago" "5 min. ago" ...

临时解决方法

l <- charToRaw(paste0(readLines("https://inciweb.nwcg.gov/"), collapse="\n"))

pg <- read_html(l)

html_table(pg)[[1]]