Question

我正在R中构建一个简单的刮刀来处理分页。我尝试使用paste0循环浏览分页的url结构。

#a vector of the urls to scrape
a <- 1:5


URLs <- function(pages) {
out <- matrix(ncol = 1, nrow = 5)
for (i in seq_along(a)) {
    fdata <- paste0("https://foo.bar", i, "/")
    out[, i] <- apply(fdata)
}}

df <- lapply(URLs, function(u){

  html.obj <- read_html(u)
  title <- html.obj %>% html_nodes('a.storylink') %>% html_text()
  score <- html.obj %>% html_nodes('span.score') %>% html_text()

 data.frame(title = title, score = score)
})


library(reshape)
data <- merge_recurse(df)

View(data)

但是，当我尝试执行此操作时，输出未正确填充URLs变量，因此其余循环数据收集根本不会执行。

我在这里找不到任何其他问题来解决像这样的串联项的循环。

有人可以提供我要去哪里的想法吗？

Answer 1

URLs函数的问题在于它以for循环结尾。这是因为for循环完成后会在NULL中返回R。

x <- for(i in 1:5){
  #do something
}
print(x)
NULL

如果您使用return(out)语句结束该函数，则很可能会解决您的问题。

编辑：尽管minem的解决方案可能更简洁地解决了该问题，但我将在此处留下此答案，以提醒不要以for循环结束函数。

Answer 2

我认为A.symbols.nupkg应该是链接的列表/向量，例如：

URLs

向量化R中的paste0进行循环

2 个答案: