VBA刮掉src而不是href

时间:2017-08-30 21:19:48

标签: vba excel-vba href scrape excel

我正在使用代码下面的代码,但它带来了' src'而不是' href'由于某些原因。有人可以帮忙吗?

Sub bringfox(txt As String)

Dim oHtml       As HTMLDocument
Dim oElement    As Object
Set oHtml = New HTMLDocument

maintext2 = "https://www.jjfox.co.uk/cigars/show/all.html"

With CreateObject("WINHTTP.WinHTTPRequest.5.1")
    .Open "GET", maintext2 & gr, False
    .send
    oHtml.body.innerHTML = .responseText
End With



counter = cnt
'oElement(i).Children(0).getAttribute ("href")
Set oElement = oHtml.getElementsByClassName("products-grid products-grid--max-3-col")(0).getElementsByTagName("a")
i = 0
While i < oElement.Length
    Debug.Print oElement(i).Children(0).getAttribute("href")

    i = i + 1

Wend


End Sub

1 个答案:

答案 0 :(得分:1)

您可以尝试使用CSS selector

#wrapper div.category-products > ul a

这是完整选择器的简化版本,定位于产品类别中的a标记。然后,您为hrefs解析outerHTML,因为它是信息所在的位置。

网站图片(示例视图)

Sample view

代码输出(样本视图)

Sample immediate window output

代码

Option Explicit
Public Sub GetInfo()
    Dim oHtml As HTMLDocument, nodeList As Object, currentItem As Long
    Const URL As String = "https://www.jjfox.co.uk/cigars/show/all.html"
    Set oHtml = New HTMLDocument
    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", URL, False
        .send
        oHtml.body.innerHTML = .responseText
    End With

    Set nodeList = oHtml.querySelectorAll("#wrapper div.category-products > ul a")
    For currentItem = 0 To nodeList.Length - 1
        On Error Resume Next
        Debug.Print Split(Split(nodeList.item(currentItem).outerHTML, "<A href=")(1), ">")(0)
        On Error GoTo 0
    Next currentItem
End Sub

或者更简单地说,使用以下

For currentItem = 0 To nodeList.Length - 1
    On Error Resume Next
    Debug.Print nodeList.item(currentItem).href
    On Error GoTo 0
Next currentItem