我正试图抓下网页(尺寸详情)
parenturl = http://www.newlook.com/shop/womens/jackets-and-coats/navy-aztec-faux-shearling-collar-parka_286764649?tmcampid=UK_AFF_AffiliateWindow
srcpage = getURLContent(GET(parenturl)$url,timeout(10))
page = htmlTreeParse(srcpage,useInternalNodes = T,encoding='UTF-8')
查看页面结构,我相信它在后台运行javascript并从服务器获取数据。我不知道如何抓取这个网页。任何帮助将不胜感激。
非常感谢,Savi
答案 0 :(得分:3)
您可以使用Selenium来执行此操作:
require(RSelenium)
RSelenium::startServer()
appURL <- "http://www.newlook.com/shop/womens/jackets-and-coats/navy-aztec-faux-shearling-collar-parka_286764649?tmcampid=UK_AFF_AffiliateWindow"
remDr <- remoteDriver()
remDr$open()
remDr$navigate(appURL)
inventory <- remDr$executeScript("return list;")
> do.call(rbind.data.frame, inventory)
color listPrice popupImage skuID
2 0 2867684
21 0 2867685
swatchImage largeImage salePrice
2 0
21 0
detailImage stockLevel size
2 75 12
21 133 14
remDr$close()
remDr$closeServer()