如何使用VBA Excel从网站中提取数据

时间:2014-07-31 04:41:37

标签: excel-vba vba excel

我阅读了一些示例代码,用于在Excel中使用VBA从网站中提取数据,例如Stackoverflow's Example。我理解了一些,但我找不到如何适应我的问题。

问题: 我需要从TE网站提取信息,在这种情况下,它的链接是http://www.te.com/catalog/products/en?q=917695-1 - 数据包括图片中显示的字段和表格。

有一些评论我发现我需要知道一些HTLM才能做到 - 但我不知道如何。

我试着查看页面的代码,然后我发现隐藏的信息,但仍然很难对提取这些字段有一些想法

`<span class="te-search-gallery-family">2.5mm Signal Double Lock Connector</span>`
`<span class="te-search-gallery-desc">2.5 SIGNAL D/LOCK PLUG HSG 11P</span>`

<th class="first-cell"> Product Type</th>
                                                    <th> Connector Type</th>                                            
                                                    <th> Connector Style</th>                                           
                                                    <th> Product Line</th>                                          
                                                    <th> Centerline</th>                                            
                                                    <th> Application Use</th>                                           
                                                    <th> Applies To</th>                                            
                                                    <th> Wire/Cable Type</th>                                           
                                                    <th> Contact Type</th>                                          
                                                    <th class="last-cell"> Number of Positions</th>
                                                    </tr>
                                </thead>
                                <tbody>
                                    <tr>
                                    <td class="first-cell"> Connector</td>
                                                <td> Housing</td>                                           
                                                <td> Plug</td>                                          
                                                <td> 2.5mm Signal Double Lock</td>                                          
                                                <td> 2.50 mm [0.098 in]</td>                                            
                                                <td> Wire-to-Wire</td>                                          
                                                <td> Wire/Cable</td>                                            
                                                <td> Discrete Wire</td>                                         
                                                <td> Socket</td>                                            
                                                <td class="last-cell"> 11</td>
                                                </tr>

Link to website

Website

1 个答案:

答案 0 :(得分:1)

看看下面的代码 - 在我的最后成功测试。它会将您要查找的所有信息打印到调试窗口 - 只需调整代码即可将它们粘贴到电子表格中的任何位置。

您还需要勾选对两者的引用

  • Microsoft HTML对象库
  • Microsoft XML,v6.0

要使用的代码

此外,此代码应该在页面返回单个产品时正常工作 - 但是在返回更多产品的实例中应该有一些额外的工作要做    Sub xhrsub()

    Dim xhr As MSXML2.XMLHTTP60
    Dim doc As MSHTML.HTMLDocument
    Dim results As MSHTML.HTMLDivElement
    Dim Family As String
    Dim desc As String
    Dim elt As MSHTML.HTMLTableCell
    Dim imgs As MSHTML.IHTMLElementCollection
    Dim img As MSHTML.HTMLImg

    Set xhr = New MSXML2.XMLHTTP60

    With xhr

        .Open "GET", "http://www.te.com/catalog/products/en?q=917695-1", False
        .send

        If .ReadyState = 4 And .Status = 200 Then
            Set doc = New MSHTML.HTMLDocument
            doc.body.innerHTML = .responseText
        End If

    End With

    With doc
        Family = .getElementsByClassName("te-search-gallery-family").Item(0).innerText
        desc = .getElementsByClassName("te-search-gallery-desc").Item(0).innerText
        Set results = .getElementById("te-search-gallery")
    End With

    Debug.Print Family
    Debug.Print desc
    Debug.Print vbNewLine

    With results.getElementsByTagName("table").Item(0)

        For Each elt In .getElementsByTagName("th")
            Debug.Print elt.innerText
        Next elt

        Debug.Print vbNewLine

        For Each elt In .getElementsByTagName("td")
            Debug.Print elt.innerText
        Next elt

    End With

    Set imgs = doc.getElementsByTagName("img")

    For Each img In imgs
        If InStr(img.getAttribute("alt"), "Click here for product details") <> 0 Then
            myurl = img.getAttribute("src")
        End If
    Next img

    Debug.Print myurl

End Sub