在需要对象交互的站点上使用XML HTTP请求

时间:2015-07-09 20:18:56

标签: json xml vba http web-scraping

我正在开展一个从多个网站上抓取信息的项目。我有许多网站没有任何问题,主要是通过修改URL来通过相关标准或发布AJAX请求来处理它们。我对此很新,所以我正在寻求一些帮助。

我遇到过一个网站,我需要与网页上的对象进行互动才能获得更多信息。这方面的一个例子是以下网站:

Example Site

如果您访问该网站并进入底部,则会有更多品牌并点击"查看"将显示其他产品。这些HTML仅在点击后返回。

我从其他网站获取信息,我使用了以下方法。 有没有办法在执行页面对象操作后通过XML HTTP方法处理页面?

非常感谢任何帮助。目前我假设我将不得不坚持使用Internet Explorer对象来抓取这些网站。

Option Explicit
Public Sub sbKF()

Dim conn As ADODB.Connection
Dim rsIn As ADODB.Recordset
Dim HTMLDoc As HTMLDocument

Dim strUrl As String
Dim strPost As String

Set conn = CurrentProject.Connection

Set rsIn = New ADODB.Recordset

Set HTMLDoc = New MSHTML.HTMLDocument

rsIn.Open pcstrInput, conn, adOpenStatic, adLockReadOnly

rsIn.MoveLast: rsIn.MoveFirst

Do While Not rsIn.EOF

    ' Create the URL and Post submission for input size.
    strUrl = "http://www.[Site].com"
    strPost = "Stage=2&sop=TyreSize&ssq=1&vnp=&vmk=&vch=&vmo=&drd="

    ' Return the Document body results
    HTMLDoc.body.innerHTML = fnPostXmlHttp(strUrl, strPost)

    rsIn.MoveNext

Loop
End Sub

Public Function fnPostXmlHttp(ByVal strUrl As String, ByVal strScript As String)

Dim XMLHttpRequest As Object
Dim strOut As String

Set XMLHttpRequest = CreateObject("MSXML2.XMLHTTP")

XMLHttpRequest.Open "POST", strUrl, False
XMLHttpRequest.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
XMLHttpRequest.send (strScript)

While XMLHttpRequest.ReadyState <> 4
    DoEvents
Wend

fnPostXmlHttp = XMLHttpRequest.responseText
End Function

1 个答案:

答案 0 :(得分:1)

如果您查看www.blackcircles.com HTML回复,您会看到

JsonObject = {...};

实际上该行代表一个JSON对象,其中包含网页数据上显示的所有内容。因此,您可以通过拆分,解析它,例如转换为数组并输出到工作表,从HTML内容中提取该JSON字符串,如下面的示例代码所示。 JSON.bas模块导入VBA项目以进行JSON处理。

Option Explicit

Sub Test_blackcircles()

    Dim sResp As String
    Dim vJSON As Variant
    Dim sState As String
    Dim i As Long
    Dim vItem
    Dim aData()
    Dim aHeader()

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.blackcircles.com/order/tyres/search?width=205&profile=55&rim=R16&speed=V&vehicle-make=&postcode=&delivery=1&findTyre=", False
        .send
        sResp = .responseText
    End With
    sResp = Split(sResp, "JsonObject = {", 2)(1)
    sResp = Split(sResp, "};", 2)(0)
    sResp = "{" & sResp & "}"
    JSON.Parse sResp, vJSON, sState
    i = 1
    With Sheets(1)
        .Cells.Delete
        .Cells.WrapText = False
        For Each vItem In Array( _
                "Manufacturers", _
                "CarManufacturers", _
                "All", _
                "Deals", _
                "Best", _
                "Rest", _
                "SearchParams" _
                )
            .Cells(i, 1).Value = vItem
            JSON.ToArray vJSON(vItem), aData, aHeader
            OutputArray .Cells(i + 2, 1), aHeader
            Output2DArray .Cells(i + 3, 1), aData
            .Columns.AutoFit
            i = i + UBound(aData, 1) + 5

        Next
    End With

End Sub

Sub OutputArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub