我正在尝试从网站上抓图片。基于R包开发人员的出色工作,我获得了图片的url。但是我无法使用download.file()下载所需的图片。我所能得到的只是一张废话。我搜索了stackoverflow,并认为问题出在网站的反爬虫机制上。也许我需要为抓取工作设置引荐来源网址。是否有人可以提出有关解决此问题的建议?这让我很烦恼,在此先谢谢您!
library(rvest)
library(RCurl)
library(XML)
library(httr)
library(stringr)
myheader <- c( "User-Agent" = "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.1.6) ", "Accept" = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language" = "en-us", "Connection" = "keep-alive", "Accept-Charset" = "GB2312,utf-8;q=0.7,*;q=0.7", "Referer" = "http://www.mm131.com")
url <- "http://www.mm131.com/mingxing/2016.html"
imgsrc <- html_session(url, add_headers(myheader))%>% html_node(".content-pic img") %>% html_attr('src')
download.file(imgsrc, "test.jpg", mode = "wb")