模拟网页上的点击链接

时间:2014-03-01 23:07:26

标签: r selenium xml-parsing

我正试图抓下网页

http://www.houseoffraser.co.uk/Eliza+J+3/4+sleeve+ruched+waist+dress/165288648,default,pd.html

每种颜色/尺寸组合的库存数据仅在选择颜色或尺寸时出现。在r中可以模拟这个来获取数据。

到目前为止,我已经能够捕捉颜色和大小

mcolour = toString(xpathSApply(page,'//ul[@class="colour-swatches-list toggle-panel"]//li[@title]',xmlGetAttr,"title"))

size = xpathSApply(page,'//ul[@class="size-swatches-list toggle-panel"]//li[@data-size]',xmlGetAttr,"data-size")

但我不确定每种颜色/尺寸组合的捕获库存水平。

请指教!

=============================================== ============= 我找不到新的方法,我错过了什么吗?

firefoxClass
Generator for class "firefoxClass":

Class fields:

Name:  exceptionTable     javaWarMes     javaDriver   javaNavigate
Class:         matrix            ANY            ANY            ANY

Class Methods:  
"back", "callSuper", "close", "copy", "export", "field", "findElementByClassName", 
 "findElementByCssSelector", "findElementById", "findElementByLinkText",  "findElementByName", 
 "findElementByPartialLinkText", "findElementByTagName", "findElementByXPath", 
 "findElementsByClassName", "findElementsByCssSelector", "findElementsById", 
 "findElementsByLinkText", "findElementsByName", "findElementsByPartialLinkText", 
 "findElementsByTagName", "findElementsByXPath", "forward", "get", "getCapabilities", 
 "getClass", "getCurrentUrl", "getPageSource", "getRefClass", "getTitle", "getVersion", 
  "import", "initFields", "initialize", "initialize#exceptionClass", "printHtml",   "refresh", 
  "show", "show#envRefClass", "trace", "tryExc", "untrace", "usingMethods"


  Reference Superclasses:  
  "exceptionClass", "envRefClass"

2 个答案:

答案 0 :(得分:1)

以下是使用relenium的示例,您可以轻松扩展该示例以查询产品颜色:

require(relenium) # More info: https://github.com/LluisRamon/relenium
require(XML)
firefox <- firefoxClass$new() # init browser
firefox$get("http://www.houseoffraser.co.uk/Eliza+J+3/4+sleeve+ruched+waist+dress/165288648,default,pd.html") # open url
sizes <- xpathSApply(htmlParse(firefox$getPageSource()), "//ul[@class='size-swatches-list toggle-panel']/li/a", xmlValue) # read available sizes

stockMsg <- vector() # init stock message vector
for (size in sizes) { # for each available size
  sizeLink <- firefox$findElementByXPath(sprintf("//ul[@class='size-swatches-list toggle-panel']/li[@data-size='%s']", size)) # focus size link
  sizeLink$click() # click size link
  stockMsg <- c(stockMsg, # and append stock message to stock message vector
                firefox$findElementByXPath("/html/body/div/div[3]/div/div/div[4]/div/div/div/div/form/div[4]/div[4]/div")$getText()
                )
}
setNames(stockMsg, sizes) # name stock msg vector and print it
# 8                       10 
# "in stock"               "in stock" 
# 12                       14 
# "in stock"               "in stock" 
# 16                       18 
# "in stock" "in stock, only 17 left" 
# 20                       22 
# "in stock, only 2 left"  "in stock, only 2 left" 
# 24                       26 
# "Out of stock"           "Out of stock" 
# 28 
# "Out of stock" 

答案 1 :(得分:0)

对于您可以从页面中抓取的给定产品ID pid,您可以通过查询获得库存可用性:

http://www.houseoffraser.co.uk/on/demandware.store/Sites-hof-Site/default/Product-UpdateQuantityList?pid=165288698&quantity=1

您甚至不需要为该查询设置任何cookie。返回一个HTML和javascript块,用于设置页面上的控件。这是一个有限库存的例子(目前有2个,虽然我可能偶然买了所有这些):

http://www.houseoffraser.co.uk/on/demandware.store/Sites-hof-Site/default/Product-UpdateQuantityList?pid=165288648&quantity=1

您可以通过解析availabilityMessage字符串或<select>控件来获取库存数量。

我没有解决的唯一步骤是获取pid值,以及如何将这些值映射到描述,但是如果Ajax没有下载它们,那么它们应该都在页面上请求(股票数据来自哪里)。

您使用的是Chrome调试器/检查器不是吗?