R:抓取站点,在URL中按日期递增,保存为CSV

时间:2016-04-04 00:31:32

标签: r csv web-scraping

我对R和网络抓取相对较新,所以对任何固有的明显错误表示道歉。

我希望从网址1抓取一个CSV文件,按日期递增到网址2,然后保存每个CSV文件。

startdate <- as.Date("2007-07-01")
enddate <- as.Date(Sys.Date())

for(startdate in enddate){ // Loop through dates on each URL 
    read.csv(url("http://api.foo.com/charts/data?output=csv&data=close&startdate=",startdate,"&enddate=",startdate,"&exchanges=bpi&dev=1"))
    startdate = startdate + 1
    startdate <- startdate[-c(1441,1442),] // Irrelevant to question at hand. Removes unwanted information auto-inserted into CSV. 
    write.csv(startdate[-c(1441,1442),], startdate, 'csv', row.names = FALSE)
}

正在输出以下错误:

read.csv(url("http://api.foo.com/charts/data?output=csv&data=close&startdate=",startdate,"&enddate=",startdate,"&exchanges=bpi&dev=1"))
// Error in match.arg(method, c("default", "internal", "libcurl", "wininet")) :'arg' should be one of “default”, “internal”, “libcurl”, “wininet”

write.csv(startdate[c(1441,1442),], startdate, 'csv', row.names = FALSE)
//Error in charToDate(x) : character string is not in a standard unambiguous format

有关如何解决这些错误的建议吗?

1 个答案:

答案 0 :(得分:1)

根据您的目标&#34;我希望从网址1抓取CSV文件,按日期递增到网址2,然后保存每个CSV文件。&#34;这是一个示例代码:

startdate <- as.Date("2016-01-01")
enddate <- as.Date(Sys.Date())

geturl <- function(sdt, edt) {
    paste0("http://api.foo.com/charts/data?output=csv&data=close",
        "&startdate=",sdt,"&enddate=",edt,"&exchanges=bpi&dev=1")
} #geturl

dir.create("data")
garbage <- lapply(seq.Date(startdate, enddate, by="1 day"), function(dt) {
    dt <- as.Date(dt)
    dat <- read.csv(url(geturl(dt, dt)))
    write.csv(dat, paste0("data/dat-",format(dt, "%Y%m%d"),".csv"), row.names=FALSE)
})

这是你要找的? 你能提供样品链接吗?和一些样本日期?