Excel VBA Scrape亚马逊库存

时间:2018-02-19 09:37:08

标签: vba excel-vba amazon screen-scraping excel

我正在寻找刮擦亚马逊库存..这是我使用的链接 https://www.amazon.com/Stratford-Pharmaceuticals-Omega-Fatty-Strength/dp/B006JCU54Y/ref=sr_1_2?s=pet-supplies&ie=UTF8&qid=1518816130&sr=1-2&keywords=stratford

有一部分标题为“与类似物品比较”,我需要提取价格(我已经这样做了)以及库存数量。 第二部分不是直接获得..手动我必须点击“添加到购物车”然后从下一页点击“购物车”然后从下一页选择“数量下拉菜单选择10+并手动输入任何大数字说999然后单击“更新” 会有一条警报消息,其中包含剩余的库存 (此卖家只有35个可用。要查看其他卖家是否有更多可用,)>>所以这是35的理想数字 这是excel文件和快照,说明了手动步骤.. 我使用IE但是如果可以使用XMLHTTP那么当然会很棒

这是我到目前为止设计的代码

Sub Test()
    Dim ws As Worksheet
    Dim ie          As Object
    Dim allLnks     As Object
    Dim lnk         As Object
    Dim r           As Long
    Dim liElem As Object
    Dim prElem As Object
    Dim crtElem As Object
    Dim elem As Object
    Dim cnt As Integer
    Dim inputElem As Object
    Dim inputEle As Object
    
    Set ws = ThisWorkbook.Worksheets("Sheet2")
    Set ie = CreateObject("InternetExplorer.Application")

    With ie
        .Visible = True
        .navigate ("https://www.amazon.com/Stratford-Pharmaceuticals-Omega-Fatty-Strength/dp/B006JCU54Y/ref=sr_1_2?s=pet-supplies&ie=UTF8&qid=1518816130&sr=1-2&keywords=stratford")

        Do: DoEvents: Loop Until .readystate = 4
        
        ws.Range("B2").Value = Format(Now(), "dd/mm/yyyy - hh:mm:ss")
        Set liElem = .document.getelementbyid("detail-bullets").getelementsbytagname("table")(0).getelementsbytagname("ul")(0)
        
        For Each elem In liElem.getelementsbytagname("li")
            If InStr(elem.innerText, "ASIN") > 0 Then ws.Range("B1").Value = Replace(elem.innerText, "ASIN: ", "")
            If InStr(elem.innerText, "Rank:") > 0 Then ws.Range("B3").Value = MyUDF(elem.innerText, "Rank: ", "(")
            If InStr(elem.innerText, "Review:") > 0 Then ws.Range("B4").Value = Replace(Split(Trim(Split(elem.innerText, "Review: ")(1)), vbLf)(1), Chr(13), "")
        Next elem
        
        Set prElem = .document.getelementbyid("comparison_price_row")
        For Each elem In prElem.getelementsbytagname("td")
            cnt = cnt + 1
            ws.Range("A" & cnt + 4).Value = "Seller " & cnt
            ws.Range("B" & cnt + 4).Value = elem.getElementsByClassName("a-offscreen")(0).innerText
        Next elem
        
        cnt = 0
        Set crtElem = .document.getelementbyid("HLCXComparisonTable").getElementsByClassName("a-button-inner")
        For Each elem In crtElem
            .navigate elem.getelementsbytagname("a")(0).href
            Do: DoEvents: Loop Until .readystate = 4
            .navigate .document.getElementsByClassName("a-button-inner")(0).getelementsbytagname("a")(0).href
            Do: DoEvents: Loop Until .readystate = 4
            
            cnt = cnt + 1
            ws.Range("C" & cnt + 4).Value = Replace(Split(Split(MyUDF(.document.getElementsByClassName("a-row a-spacing-base sc-action-quantity sc-action-quantity-right")(0).innerHTML, "maxlength=", "quantity="), "autocomplete")(0), "=")(1), """", "")
        Next elem
        
        Stop
        '.Quit
    End With
End Sub

Function MyUDF(s As String, b As String, a As String) As String
    Dim arr()       As String
    Dim r           As String

    arr = Split(s, b)

    If UBound(arr) > 0 Then
        r = arr(1)
        arr = Split(r, a)

        If UBound(arr) > 0 Then
            r = arr(0)
        End If
    End If

    MyUDF = Trim(r)
End Function

以下是可能有帮助的快照

Prices

Click Cart

Enter 999 in quantity[![][3]] 4

2 个答案:

答案 0 :(得分:1)

CSS选择器获取股票信息

从您的代码中获取以下示例:

Web page

您可以使用CSS选择器来定位有关库存水平的文本。

.sc-product-availability

使用购物车视图页面的CSS查询示例(由您的代码生成):

E.g。相关cart view html

的CSS查询

CSS query

.是ClassName的选择器。

<强> VBA

您可以使用.document.querySelectorAll方法检索匹配项的nodeList(示例中为2)

Dim nodeList As Object
Set nodeList = .document.querySelectorAll(".sc-product-availability")

然后,您将遍历其长度以检索项目(未测试,但这是一般方法)。

Dim i As Long
For i = 0 to nodeList.Length - 1
    Debug.Print nodeList.Item(i).innerText
Next i

希望这对你有用。

答案 1 :(得分:1)

试一试。它应该取你所追求的数字。我结合使用xmlhttpSelenium来使脚本运行得更快一些。我在第二种方法中无法使用xmlhttp请求,因为链接是javascript加密的。

运行以下脚本后,您可以了解卖家拥有多少这些商品。即使卖家没有这样的物品,脚本也不会破坏,因为我已经管理过了。

它是:

Sub GetInfo()
    Const base As String = "https://www.amazon.com"
    Const mainurl As String = "https://www.amazon.com/Stratford-Pharmaceuticals-Omega-Fatty-Strength/dp/B006JCU54Y/ref=sr_1_2?s=pet-supplies&ie=UTF8&qid=1518816130&sr=1-2&keywords=stratford"
    Dim Http As New XMLHTTP60, Htmldoc As New HTMLDocument, itext As Object
    Dim driver As New ChromeDriver, idic As New Scripting.Dictionary
    Dim post As Object, oinput As Object, posts As Object, elem As Object
    Dim idrop As Object, oclick As Object, I&, key As Variant

    With Http
        .Open "GET", mainurl, False
        .send
        Htmldoc.body.innerHTML = .responseText
    End With

    With Htmldoc.querySelectorAll("[id^='comparison_add_to_cart_'].a-button-text")
        For I = 0 To .Length - 1
            idic(base & Replace(.item(I).getAttribute("href"), "about:", "")) = 1
        Next I
    End With

    For Each key In idic.keys
        driver.get key
        Set post = driver.FindElementByCss("input[value='addToCart']", Raise:=False, timeout:=10000)
        If Not post Is Nothing Then
            post.Click
        End If

        Set posts = driver.FindElementById("hlb-view-cart-announce", timeout:=10000)
        posts.Click

        Set elem = driver.FindElementByCss("span#a-autoid-0-announce", timeout:=10000)
        elem.Click

        Set idrop = driver.FindElementById("dropdown1_9", timeout:=10000)
        idrop.Click

        Set oinput = driver.FindElementByCss("input[name='quantityBox']", timeout:=10000)
        oinput.SendKeys "100"

        Set oclick = driver.FindElementByCss("#a-autoid-1", timeout:=10000)
        oclick.Click

        Set itext = driver.FindElementByCss(".sc-quantity-update-message span.a-size-base", Raise:=False, timeout:=5000)
        If Not itext Is Nothing Then
            R = R + 1: Cells(R, 1) = itext.Text
        Else
            R = R + 1: Cells(R, 1) = "Sorry dear nothing found"
        End If
    Next key
End Sub

参考添加:

Selenium Type Library
Microsoft HTML Object Library
Microsoft XML, v6.0
Microsoft Scripting Runtime

输出你可能会如下所示。现在,您可以使用正则表达式来解析数字48:

This seller has only 48 of these available. To see if more are available from another seller, go to the product detail page.