我有一个for循环,用于一些Web抓取。例如,假设它正在收集历史库存数据。
start <- 1533103200
end <- 1549004400
company <- c("fb","amzn","f")
for (i in company){
print(paste('https://finance.yahoo.com/quote/',i, '/history?period1=',start,'&period2=',maxDate,'&interval=1d&filter=history&frequency=1d',sep=""))
}
开始和结束是日期代码。现在,我有一个起始日期和结束日期代码(间隔100天)的data.frame,我也想进入打印链接的列表,这意味着我需要三个x的以下data.frame而不是三个链接。在这个例子中,那将是6个链接...
start <- c(1533193200,1541833200)
end <- c(1541746800,1549004400)
dates <- as.data.frame(cbind(start,end))
该列表是动态的并且很长,因此我可能必须将for循环嵌入另一个for循环中,但是我没有太多经验使用两个变量来实现此目的。任何帮助都会很棒!
预期结果将是...。
[1] "https://finance.yahoo.com/quote/fb/history?period1=1533193200&period2=1541746800&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/amzn/history?period1=1533193200&period2=1541746800&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/f/history?period1=1533193200&period2=1541746800&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/fb/history?period1=1541833200&period2=1549004400&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/amzn/history?period1=1541833200&period2=1549004400&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/f/history?period1=1541833200&period2=1549004400&interval=1d&filter=history&frequency=1d"
...而不是第一个循环的结果...
[1] "https://finance.yahoo.com/quote/fb/history?period1=1533103200&period2=1548918000&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/amzn/history?period1=1533103200&period2=1548918000&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/f/history?period1=1533103200&period2=1548918000&interval=1d&filter=history&frequency=1d"
答案 0 :(得分:0)
您需要遍历公司的AND日期。
start <- c(1533193200,1541833200)
end <- c(1541746800,1549004400)
dates <- as.data.frame(cbind(start,end))
companies <- c("fb","amzn","f")
string <- 'https://finance.yahoo.com/quote/%s/history?period1=%s&period2=%s&interval=1d&filter=history&frequency=1d'
for (company in companies) {
for (date in 1:nrow(dates)) {
date <- dates[date, ]
print(sprintf(string, company, date["start"], date["end"]))
}
}
[1] "https://finance.yahoo.com/quote/fb/history?period1=1533193200&period2=1541746800&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/fb/history?period1=1541833200&period2=1549004400&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/amzn/history?period1=1533193200&period2=1541746800&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/amzn/history?period1=1541833200&period2=1549004400&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/f/history?period1=1533193200&period2=1541746800&interval=1d&filter=history&frequency=1d"
[1] "https://finance.yahoo.com/quote/f/history?period1=1541833200&period2=1549004400&interval=1d&filter=history&frequency=1d"
答案 1 :(得分:0)
我简化了您的data.frame
的结构:
df <- data.frame(
start = c(1533193200, 1541833200),
end = c(1541746800, 1549004400)
)
然后,我将为每个公司在data.frame
中分配新列:
companies <- c("fb", "amzn", "f")
df[, companies] <- ""
现在,您可以遍历新列并用链接填充它们:
for (i in companies) {
df[, i] <- paste0(
'https://finance.yahoo.com/quote/',
i, '/history?period1=',
df$start,
'&period2=',
df$maxDate,
'&interval=1d&filter=history&frequency=1d')
}
在单独的列中,每个公司的链接都很好data.frame
:
> df
start end
1 1533193200 1541746800
2 1541833200 1549004400
fb
1 https://finance.yahoo.com/quote/fb/history?period1=1533193200&period2=&interval=1d&filter=history&frequency=1d
2 https://finance.yahoo.com/quote/fb/history?period1=1541833200&period2=&interval=1d&filter=history&frequency=1d
amzn
1 https://finance.yahoo.com/quote/amzn/history?period1=1533193200&period2=&interval=1d&filter=history&frequency=1d
2 https://finance.yahoo.com/quote/amzn/history?period1=1541833200&period2=&interval=1d&filter=history&frequency=1d
f
1 https://finance.yahoo.com/quote/f/history?period1=1533193200&period2=&interval=1d&filter=history&frequency=1d
2 https://finance.yahoo.com/quote/f/history?period1=1541833200&period2=&interval=1d&filter=history&frequency=1d
您可以在“整洁”这个,如果你喜欢用的链接,并作为有关链接元信息等栏目列了:
df_tidy <- tidyr::gather(df, company, url, -start, -end)
> df_tidy$url
[1] "https://finance.yahoo.com/quote/fb/history?period1=1533193200&period2=&interval=1d&filter=history&frequency=1d"
[2] "https://finance.yahoo.com/quote/fb/history?period1=1541833200&period2=&interval=1d&filter=history&frequency=1d"
[3] "https://finance.yahoo.com/quote/amzn/history?period1=1533193200&period2=&interval=1d&filter=history&frequency=1d"
[4] "https://finance.yahoo.com/quote/amzn/history?period1=1541833200&period2=&interval=1d&filter=history&frequency=1d"
[5] "https://finance.yahoo.com/quote/f/history?period1=1533193200&period2=&interval=1d&filter=history&frequency=1d"
[6] "https://finance.yahoo.com/quote/f/history?period1=1541833200&period2=&interval=1d&filter=history&frequency=1d"