Question

这里（Web scraping pdf files from HTML）回答了这个问题，但是该解决方案对我的目标URL或op的目标URL均不起作用。我不应该问这个问题作为对早先帖子的回答，所以我要开始一个新的问题。

我的代码与操作完全相同，并且收到的错误消息是 “ download.file（links [i]，destfile = save_names [i]）中的错误：无效的“ url”参数”

我正在使用的代码是：

install.packages("RCurl")
install.packages("XML")
library(XML)
library(RCurl)
url <- "https://www.bot.or.th/English/MonetaryPolicy/Northern/EconomicReport/Pages/Releass_Economic_north.aspx"
page   <- getURL(url)
parsed <- htmlParse(page)
links  <- xpathSApply(parsed, path="//a", xmlGetAttr, "href")
inds   <- grep("*.pdf", links)
links  <- links[inds]


regex_match <- regexpr("[^/]+$", links)
save_names <- regmatches(links, regex_match)

for(i in seq_along(links)){
  download.file(links[i], destfile=save_names[i])
  Sys.sleep(runif(1, 1, 5))

}

任何帮助，不胜感激！谢谢

Answer 1

解决了！我不知道为什么会这样，但是可以。我已经将for循环交换为以下代码，并且可以正常工作：

Map (function(u, d) download.file(u, d, mode='wb'), links, save_names)

从网上抓取pdf文件

1 个答案: