我有一个如下所示的数据框:
urls <- data.frame(c("https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1212/08",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1212/09",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1213/07",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1213/08",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1213/09",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1214/07",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1214/08",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1214/09"))
要下载每个网站的每张图片,我在StackOverflow中的一些人的帮助下创建了这段代码:
library(rvest)
library(dplyr)
for (url in urls) {
webpage <- html_session(url)
link.titles <- webpage %>% html_nodes("img")
img.url <- link.titles %>% html_attr("src")
download.file(img.url, url, ".jpg", mode = "wb")
}
但是,它会返回此错误:
Error: is.character(url) is not TRUE
奇怪的是,在没有循环功能的情况下运行它可以正常工作:
url <- "https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1692/09"
webpage <- html_session(url)
link.titles <- webpage %>% html_nodes("img")
img.url <- link.titles %>% html_attr("src")
download.file(img.url, "test.jpg", mode = "wb")
我希望下载每个网站的每张图片。
答案 0 :(得分:1)
我认为它是在数据框中读取您的网址作为因素 - 您需要{{1}}这样,
{{1}}
答案 1 :(得分:1)
这样可行,但看起来每张图片都是一样的,不确定这是不是意图。
urls <- c("https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1212/08",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1212/09",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1213/07",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1213/08",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1213/09",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1214/07",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1214/08",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=1214/09")
for (url in 1:length(urls)) {
print(url)
webpage <- html_session(urls[url])
link.titles <- webpage %>% html_nodes("img")
img.url <- link.titles %>% html_attr("src")
download.file(img.url, paste0(url,".jpg"), mode = "wb")
}
我将网址从数据框更改为字符向量,如果您想将其保存在df中,请执行以下操作:
for(i in 1:nrow(urls_df)){...}
然后必须在身体中像这样引用它
webpage <- html_session(urls_df[i,1]) # Refers to the i'th row column 1
我还将参数更改为download.file,这与您的循环不同于单一解决方案。
下载所有图片:
for (url in 1:length(urls)) {
print(url)
webpage <- html_session(urls[url])
link.titles <- webpage %>% html_nodes("img")
img.url <- link.titles %>% html_attr("src")
for(j in 1:length(img.url)){
download.file(img.url[j], paste0(url,'.',j,".jpg"), mode = "wb")
}
}
如果您只想查看正文中的图像,查看结构,则可以创建if
条件,仅在length(img.url) > 1