我正在尝试从许多链接中提取一些信息。
我正在应用以下功能:
walk(filinginfohref, function(x) {
download.file(x, destfile = paste0("D:/deleteme/",x), quiet = FALSE)
})
但是它返回以下错误:
Error in download.file(x, destfile = paste0("D:/deleteme/", x), quiet = FALSE) :
cannot open destfile 'D:/deleteme/https://www.sec.gov/Archives/edgar/data/1750/000104746918004978/0001047469-18-004978-index.htm', reason 'Invalid argument'
我认为这是因为我无法将链接存储为目标文件。
我需要以某种方式保留从中下载文件的链接
我该如何克服这个问题?
数据
filinginfohref <- c("https://www.sec.gov/Archives/edgar/data/1750/000104746918004978/0001047469-18-004978-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746917004528/0001047469-17-004528-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746916014299/0001047469-16-014299-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746915006136/0001047469-15-006136-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746914006243/0001047469-14-006243-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746913007797/0001047469-13-007797-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746912007300/0001047469-12-007300-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746911006302/0001047469-11-006302-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746910006500/0001047469-10-006500-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746909006783/0001047469-09-006783-index.htm"
)
答案 0 :(得分:1)
每个链接都将/
解释为文件夹。构建的路径不存在。
请用destfile = paste0("D:/deleteme/",x)
替换destfile = paste0("D:/deleteme/", gsub("/", "_", x, fixed = TRUE))
通过这种方式,您可以使字符_
避免麻烦。
可能有保持链接完整的方法。
答案 1 :(得分:1)
您已经知道,Windows不允许您使用特殊字符保存那些名称文件。添加一个函数以删除文件名的公共部分并删除那些“ /”。
library(purrr)
htmName <- function (x) {
x <- gsub("https://www.sec.gov/Archives/edgar/data/", "",x)
x <- gsub("/","_",x)
return(x)
}
walk(filinginfohref, function(x) {
download.file(x, destfile = paste0("output/", htmName(x)), quiet = FALSE)
})