我一直在尝试跳过download.file
的迭代,这段时间太长而且没有相应的工作,尽管我已经尝试了类似answers的问题。我在下面用我一直使用的代码设置了一个例子。我的主要问题是我用来提取.csv文件的一些ID(来自下面的vec
对象)没有相关的.csv文件,并且URL不能正常工作 - 我相信它继续尝试URL,直到它得到响应,它没有,并且循环开始花费太长时间。如果download.file
开始花费太长时间,我该如何跳过ID?
library(stringr)
library(R.utils)
vec=c("05231992000181","00628708000191","05816554000185", "01309949000130","07098414000144", "07299568000102", "12665438000178", "63599658000181", "12755123000111", "12376766000154",
"11890564000163", "04401095000106", "11543768000128", "10695634000160", "34931022000197", "10422225000190",
"09478854000152", "12682106000100", "11581441000140", "10545688000149", "10875891000183", "13095498000165",
"10809607000170", "07976466000176", "11422211000139", "41205907000174", "08326720000153", "06910908000119",
"04196935000227", "02323120000155", "96560701000154")
for (i in seq_along(vec)) {
url = paste0("http://compras.dados.gov.br/licitacoes/v1/licitacoes.csv?cnpj_vencedor=", vec[i])
tryCatch(expr = {evalWithTimeout(download.file(url,
destfile = paste0("C:/Users/Username/Desktop/example_file/",vec[i],".csv"),
mode="wb"), timeout=3)},
error=function(ex) cat("Timeout. Skipping.\n"))
print(i)
}
答案 0 :(得分:1)
如果可能,HTTP状态是处理这种情况的有效方法,但如果服务器没有响应,您可以设置httr::timeout
的超时,并传递给httr::GET
。通过tidyverse将所有内容保存在整洁的数据框列表列中,
library(dplyr)
library(purrr)
base_url <- "http://compras.dados.gov.br/licitacoes/v1/licitacoes.csv?cnpj_vencedor="
df <- data_frame(cnpj_vencedor = c("05231992000181", "00628708000191", "05816554000185", "01309949000130","07098414000144", "07299568000102", "12665438000178", "63599658000181", "12755123000111", "12376766000154", "11890564000163", "04401095000106", "11543768000128", "10695634000160", "34931022000197", "10422225000190", "09478854000152", "12682106000100", "11581441000140", "10545688000149", "10875891000183", "13095498000165","10809607000170", "07976466000176", "11422211000139", "41205907000174", "08326720000153", "06910908000119", "04196935000227", "02323120000155", "96560701000154"))
df <- df %>%
# iterate GET over URLs, modified by `purrr::safely` to return a list of
# the result and the error (NULL where appropriate), with timeout set
mutate(response = map(paste0(base_url, cnpj_vencedor),
safely(httr::GET), httr::timeout(3)))
df <- df %>%
# extract response (drop errors)
mutate(response = map(response, 'result'),
# where there is a response, extract its data
data = map_if(response, negate(is.null), httr::content))
df
#> # A tibble: 31 x 3
#> cnpj_vencedor response data
#> <chr> <list> <list>
#> 1 05231992000181 <S3: response> <tibble [49 × 18]>
#> 2 00628708000191 <S3: response> <NULL>
#> 3 05816554000185 <S3: response> <tibble [1 × 18]>
#> 4 01309949000130 <S3: response> <NULL>
#> 5 07098414000144 <NULL> <NULL>
#> 6 07299568000102 <NULL> <NULL>
#> 7 12665438000178 <NULL> <NULL>
#> 8 63599658000181 <NULL> <NULL>
#> 9 12755123000111 <NULL> <NULL>
#> 10 12376766000154 <NULL> <NULL>
#> # ... with 21 more rows