使用URL下载大文件时出现超时错误

时间:2017-07-11 14:10:39

标签: json r database web-scraping timeout

当我尝试在for循环中下载数据集时,我总是得到此错误文件中的错误(文件,“rt”):无法打开连接 另外:警告信息: 在文件(文件,“rt”):InternetOpenUrl失败:'操作超时' 我已经尝试将超时更改为不同的值100,200,500,1000,但它似乎没有改变,因为错误发生在相同的时间后,无论我设置什么值。

## Get catalog and select for dataset ids, also filtered out data sets that don't contain key words
catalog<-tbl_df(read.csv2("https://datasource.kapsarc.org/explore/download/"))
select(catalog,datasetid,country,data.classification,title,theme,keyword)%>%
filter(grepl("Demo|Econo|Trade|Industry|Policies|Residential",theme))%>%
  select(datasetid)->datasets

data_kapsarc<-list()
base_url<-"https://datasource.kapsarc.org/api/v2/catalog/datasets/population-by-sex-and-age-group/exports/json?rows=-1&start=0&pretty=true&timezone=UTC"

options(timeout=1000)

##download data sets and store in  list of dataframes
for (i in 1:length(datasets$datasetid)){
  try({
    url<-gsub("population-by-sex-and-age-group",datasets[i,1]$datasetid,base_url)
    temp <- tempfile()
    download.file(url,temp,mode='wb')
    data_kapsarc[[i]]<-fromJSON(temp)
    unlink(temp)
  },silent=TRUE)


}

0 个答案:

没有答案