我正在尝试从此链接下载所有.gz文件: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/
到目前为止,我已经尝试过了,但没有得到任何结果:
require(RCurl)
url= "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/"
filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE)
filenames <- strsplit(filenames, "\r\n")
filenames = unlist(filenames)
我收到此错误:
Error in function (type, msg, asError = TRUE) :
Operation timed out after 300552 milliseconds with 0 out of 0 bytes received
有人可以帮忙吗?
谢谢
编辑: 我尝试使用下面提供的文件名运行,因此在我的r脚本中,我有:
require(RCurl)
my_url <-"ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/"
my_filenames= c("bed_chr_11.bed.gz", ..."bed_chr_9.bed.gz.md5")
my_filenames <- strsplit(my_filenames, "\r\n")
my_filenames = unlist(my_filenames)
for(my_file in my_filenames){
download.file(paste0(my_url, my_file), destfile = file.path('/mydir', my_file))
}
当我运行脚本时,会收到以下警告:
trying URL 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/bed_chr_11.bed.gz'
download.file(paste0(my_url,my_file),destfile = file.path(“ / mydir”,)中的错误: 无法打开URL“ ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/bed_chr_11.bed.gz” 另外:警告消息: 在download.file(paste0(my_url,my_file)中,destfile = file.path(“ / mydir”,: 网址“ ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/bed_chr_11.bed.gz”:状态为“已达到超时” 执行停止
答案 0 :(得分:0)
您要访问的文件名是
filenames <- c("bed_chr_11.bed.gz", "bed_chr_11.bed.gz.md5", "bed_chr_12.bed.gz",
"bed_chr_12.bed.gz.md5", "bed_chr_13.bed.gz", "bed_chr_13.bed.gz.md5",
"bed_chr_14.bed.gz", "bed_chr_14.bed.gz.md5", "bed_chr_15.bed.gz",
"bed_chr_15.bed.gz.md5", "bed_chr_16.bed.gz", "bed_chr_16.bed.gz.md5",
"bed_chr_17.bed.gz", "bed_chr_17.bed.gz.md5", "bed_chr_18.bed.gz",
"bed_chr_18.bed.gz.md5", "bed_chr_19.bed.gz", "bed_chr_19.bed.gz.md5",
"bed_chr_20.bed.gz", "bed_chr_20.bed.gz.md5", "bed_chr_21.bed.gz",
"bed_chr_21.bed.gz.md5", "bed_chr_22.bed.gz", "bed_chr_22.bed.gz.md5",
"bed_chr_AltOnly.bed.gz", "bed_chr_AltOnly.bed.gz.md5", "bed_chr_MT.bed.gz",
"bed_chr_MT.bed.gz.md5", "bed_chr_Multi.bed.gz", "bed_chr_Multi.bed.gz.md5",
"bed_chr_NotOn.bed.gz", "bed_chr_NotOn.bed.gz.md5", "bed_chr_PAR.bed.gz",
"bed_chr_PAR.bed.gz.md5", "bed_chr_Un.bed.gz", "bed_chr_Un.bed.gz.md5",
"bed_chr_X.bed.gz", "bed_chr_X.bed.gz.md5", "bed_chr_Y.bed.gz",
"bed_chr_Y.bed.gz.md5", "bed_chr_1.bed.gz", "bed_chr_1.bed.gz.md5",
"bed_chr_10.bed.gz", "bed_chr_10.bed.gz.md5", "bed_chr_2.bed.gz",
"bed_chr_2.bed.gz.md5", "bed_chr_3.bed.gz", "bed_chr_3.bed.gz.md5",
"bed_chr_4.bed.gz", "bed_chr_4.bed.gz.md5", "bed_chr_5.bed.gz",
"bed_chr_5.bed.gz.md5", "bed_chr_6.bed.gz", "bed_chr_6.bed.gz.md5",
"bed_chr_7.bed.gz", "bed_chr_7.bed.gz.md5", "bed_chr_8.bed.gz",
"bed_chr_8.bed.gz.md5", "bed_chr_9.bed.gz", "bed_chr_9.bed.gz.md5"
)
文件很大,所以我没有检查整个循环,但这至少对第一个文件有效。将此添加到代码末尾。
my_url <- 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/'
for(my_file in filenames){ # loop over the files
# download each file, saving in a directory that you need to create on your own computer
download.file(paste0(my_url, my_file), destfile = file.path('c:/users/josep/Documents/', my_file))
}