我正试图抓下网页
http://www.houseoffraser.co.uk/Eliza+J+3/4+sleeve+ruched+waist+dress/165288648,default,pd.html
每种颜色/尺寸组合的库存数据仅在选择颜色或尺寸时出现。在r中可以模拟这个来获取数据。
到目前为止,我已经能够捕捉颜色和大小
mcolour = toString(xpathSApply(page,'//ul[@class="colour-swatches-list toggle-panel"]//li[@title]',xmlGetAttr,"title"))
size = xpathSApply(page,'//ul[@class="size-swatches-list toggle-panel"]//li[@data-size]',xmlGetAttr,"data-size")
但我不确定每种颜色/尺寸组合的捕获库存水平。
请指教!
=============================================== ============= 我找不到新的方法,我错过了什么吗?
firefoxClass
Generator for class "firefoxClass":
Class fields:
Name: exceptionTable javaWarMes javaDriver javaNavigate
Class: matrix ANY ANY ANY
Class Methods:
"back", "callSuper", "close", "copy", "export", "field", "findElementByClassName",
"findElementByCssSelector", "findElementById", "findElementByLinkText", "findElementByName",
"findElementByPartialLinkText", "findElementByTagName", "findElementByXPath",
"findElementsByClassName", "findElementsByCssSelector", "findElementsById",
"findElementsByLinkText", "findElementsByName", "findElementsByPartialLinkText",
"findElementsByTagName", "findElementsByXPath", "forward", "get", "getCapabilities",
"getClass", "getCurrentUrl", "getPageSource", "getRefClass", "getTitle", "getVersion",
"import", "initFields", "initialize", "initialize#exceptionClass", "printHtml", "refresh",
"show", "show#envRefClass", "trace", "tryExc", "untrace", "usingMethods"
Reference Superclasses:
"exceptionClass", "envRefClass"
答案 0 :(得分:1)
以下是使用relenium
的示例,您可以轻松扩展该示例以查询产品颜色:
require(relenium) # More info: https://github.com/LluisRamon/relenium
require(XML)
firefox <- firefoxClass$new() # init browser
firefox$get("http://www.houseoffraser.co.uk/Eliza+J+3/4+sleeve+ruched+waist+dress/165288648,default,pd.html") # open url
sizes <- xpathSApply(htmlParse(firefox$getPageSource()), "//ul[@class='size-swatches-list toggle-panel']/li/a", xmlValue) # read available sizes
stockMsg <- vector() # init stock message vector
for (size in sizes) { # for each available size
sizeLink <- firefox$findElementByXPath(sprintf("//ul[@class='size-swatches-list toggle-panel']/li[@data-size='%s']", size)) # focus size link
sizeLink$click() # click size link
stockMsg <- c(stockMsg, # and append stock message to stock message vector
firefox$findElementByXPath("/html/body/div/div[3]/div/div/div[4]/div/div/div/div/form/div[4]/div[4]/div")$getText()
)
}
setNames(stockMsg, sizes) # name stock msg vector and print it
# 8 10
# "in stock" "in stock"
# 12 14
# "in stock" "in stock"
# 16 18
# "in stock" "in stock, only 17 left"
# 20 22
# "in stock, only 2 left" "in stock, only 2 left"
# 24 26
# "Out of stock" "Out of stock"
# 28
# "Out of stock"
答案 1 :(得分:0)
对于您可以从页面中抓取的给定产品ID pid
,您可以通过查询获得库存可用性:
您甚至不需要为该查询设置任何cookie。返回一个HTML和javascript块,用于设置页面上的控件。这是一个有限库存的例子(目前有2个,虽然我可能偶然买了所有这些):
您可以通过解析availabilityMessage
字符串或<select>
控件来获取库存数量。
我没有解决的唯一步骤是获取pid
值,以及如何将这些值映射到描述,但是如果Ajax没有下载它们,那么它们应该都在页面上请求(股票数据来自哪里)。
您使用的是Chrome调试器/检查器不是吗?