我试图从FTP服务器上的较大目录下载一些zip文件。目前我有代码加载目录并搜索zip文件,然后下载所有扩展名为.zip的文件。
url <- "ftp://ftp.zakupki.gov.ru/fcs_regions/Adygeja_Resp/protocols/"
userpw <- "free:free"
protocol <- getURL(url, userpwd=userpw, ftp.use.epsv=TRUE, dirlistonly=TRUE)
filenames <- protocol <- strsplit(protocol, "\r*\n")[[1]]
write.table(filenames, "names.txt", sep="\t")
zips <- sapply(filenames,function(x) substr(x,nchar(x)-2,nchar(x)))== "zip"
downloads <- filenames[zips]
con <- getCurlHandle(ftp.use.epsv = TRUE, userpwd=userpw)
mapply(function(x,y) writeBin(getBinaryURL(x, curl = con, dirlistonly = FALSE), y), x = downloads, y = paste("C://temp//",downloads, sep = ""))
昨晚我运行了代码,并且能够毫无问题地下载文件,但是当我今天再次尝试运行时,我收到了以下错误:
Error in function (type, msg, asError = TRUE) :
Failed to connect to protocol_Adygeja_Resp_2014030100_2014040100_20140710102838_001.xml.zip port 80: Connection refused
我尝试关闭R中的internet2设置,以及更改ftp.use.espv设置。我确定上面列出的代码第一次运行正常,但我尝试过的设置更改都没有帮助。
由于
答案 0 :(得分:2)
您的代码对我有用,但您可能想尝试使用更现代的curl
包:
library(curl)
# Get dir listing ---------------------------------------------------------
list_h <- new_handle()
handle_setopt(list_h, userpwd=userpw, ftp_use_epsv=TRUE, dirlistonly=TRUE)
con <- curl(url, "r", handle=list_h)
protocol <- readLines(con)
close(con)
# Save off a list of the filenames ----------------------------------------
writeLines(protocol, con="names.txt")
# Filter out only .zip files ----------------------------------------------
just_zips <- grep("\\.zip$", protocol, value=TRUE)
# Download the files ------------------------------------------------------
dl_h <- new_handle()
handle_setopt(dl_h, userpwd=userpw, ftp_use_epsv=TRUE)
for (i in seq_along(just_zips)) {
curl_fetch_disk(url=sprintf("%s%s", url, just_zips[i]),
path=sprintf("/tmp/%s", just_zips[i]),
handle=dl_h)
}
您需要更改/tmp
,但这在我的Mac上运行正常。我没有足够方便的Windows系统来尝试它。