刮痧问题 - 需要建议

时间:2014-04-06 20:58:29

标签: r curl

我正试图抓下网页(尺寸详情)

 parenturl = http://www.newlook.com/shop/womens/jackets-and-coats/navy-aztec-faux-shearling-collar-parka_286764649?tmcampid=UK_AFF_AffiliateWindow

srcpage = getURLContent(GET(parenturl)$url,timeout(10))
page = htmlTreeParse(srcpage,useInternalNodes = T,encoding='UTF-8')   

查看页面结构,我相信它在后台运行javascript并从服务器获取数据。我不知道如何抓取这个网页。任何帮助将不胜感激。

非常感谢,Savi

1 个答案:

答案 0 :(得分:3)

您可以使用Selenium来执行此操作:

require(RSelenium)
RSelenium::startServer()
appURL <- "http://www.newlook.com/shop/womens/jackets-and-coats/navy-aztec-faux-shearling-collar-parka_286764649?tmcampid=UK_AFF_AffiliateWindow"
remDr <- remoteDriver()
remDr$open()
remDr$navigate(appURL)
inventory <- remDr$executeScript("return list;")
> do.call(rbind.data.frame, inventory)
color listPrice popupImage   skuID
2                0            2867684
21               0            2867685
swatchImage largeImage salePrice
2                                 0
21                                0
detailImage stockLevel size
2                      75   12
21                    133   14

remDr$close()
remDr$closeServer()