如何修改Web抓取代码以遍历产品项目符号,直到找到正确的项目符号并提取信息?

时间:2016-07-13 15:21:32

标签: excel vba excel-vba internet-explorer web-scraping

这是我的代码。它拉出"项目型号"关闭亚马逊详情页面。这是为了找到"项目型号"在详细页面中的项目符号点并提取其旁边的数字。

问题在于它有时无法提取"项目型号"从显然有项目型号的页面。

这是代码

  dependencies {
    classpath 'com.android.tools.build:gradle:2.1.0'
    classpath 'com.google.gms:google-services:3.0.0'
}

这是一个HTML字符串,其中Code正常工作:

Sub Get_ITEM_CODE(ie As Object)
    Dim WB As Workbook
    Dim WS As Worksheet
    Dim y As String
    Dim AmUrl As String
    AmUrl = ActiveCell.Value
    ''Set WB = Workbooks.Add

    Set WS = Sheets("Extract Item COde")
    ie.Navigate AmUrl
    Application.Wait (Now + TimeValue("00:00:02"))
    Do While ie.readyState <> 4: Loop
    On Error Resume Next

    y = ie.document.getElementById("productDetails_detailBullets_sections1").innerText

    WS.Range("A1").Value = y

    SplitTextItemCode
    AddtoListItemCode

End Sub

URL to webpage

这是一段HTML代码,其中代码无法正常工作:

<div id="detailBullets" class="feature" data-feature-name="detailBullets">

<div id="detailBulletsWrapper_feature_div" data-feature-name="detailBullets" data-template-name="detailBullets" class="a-section a-spacing-none feature">
    <div id="detailBullets_feature_div">

URL to Web Page

1 个答案:

答案 0 :(得分:0)

试试这个,它将遍历该元素id下的每个“li”项。然后它将替换/删除“项目型号:”文本,这样您就可以为您所查看的任何产品留下干净的型号。

Dim Cnt As Variant
Dim oCell As Object
Cnt = 0  
With ie.Document.body.all.Item("detailBulletsWrapper_feature_div").all
    For Each oCell In .tags("li")
        If InStr(oCell.innerText, "Item model number:") > 0 Then
          ModelNum = oCell.innerText
          ModelNum = VBA.Replace(ModelNum, "Item model number:  ", "")
          Debug.Print ModelNum
        Exit For
        End If
  Cnt = Cnt + 1
Next oCell
End With
Set oCell = Nothing
   End Sub