我试图使用R来编目和跟踪所有类型的优惠券(标题,图片,描述,过期以及它所属的过滤器)。我认为这是javascript所以基本的抓取工具不工作。
有没有办法留在R中并且这样做(不精通其他系统)
尝试遵循以下内容 - 但似乎无法让它发挥作用
https://datascienceplus.com/scraping-javascript-rendered-web-content-using-r/
修改
library(rvest)
coupon <- read_html("kroger.com/cl/coupons/")
coupon <- coupon %>% + html_nodes(".Text--bold") %>%
html_text()
coupon
也尝试了这个:
#Loading both the required libraries
library(rvest)
library(V8)
#URL with js-rendered content to be scraped
link <- 'kroger.com/cl/coupons/'
#Read the html page content and extract all javascript codes that are inside a list
emailjs <- read_html(kroger.com/cl/coupons) %>% html_nodes('li') %>%
html_nodes('script') %>% html_text()
# Create a new v8 context
ct <- v8()
#parse the html content from the js output and print it as text
read_html(ct$eval(gsub('document.write','',emailjs))) %>% html_text()