我正在开展一个从多个网站上抓取信息的项目。我有许多网站没有任何问题,主要是通过修改URL来通过相关标准或发布AJAX
请求来处理它们。我对此很新,所以我正在寻求一些帮助。
我遇到过一个网站,我需要与网页上的对象进行互动才能获得更多信息。这方面的一个例子是以下网站:
如果您访问该网站并进入底部,则会有更多品牌并点击"查看"将显示其他产品。这些HTML
仅在点击后返回。
我从其他网站获取信息,我使用了以下方法。 有没有办法在执行页面对象操作后通过XML HTTP方法处理页面?
非常感谢任何帮助。目前我假设我将不得不坚持使用Internet Explorer对象来抓取这些网站。
Option Explicit
Public Sub sbKF()
Dim conn As ADODB.Connection
Dim rsIn As ADODB.Recordset
Dim HTMLDoc As HTMLDocument
Dim strUrl As String
Dim strPost As String
Set conn = CurrentProject.Connection
Set rsIn = New ADODB.Recordset
Set HTMLDoc = New MSHTML.HTMLDocument
rsIn.Open pcstrInput, conn, adOpenStatic, adLockReadOnly
rsIn.MoveLast: rsIn.MoveFirst
Do While Not rsIn.EOF
' Create the URL and Post submission for input size.
strUrl = "http://www.[Site].com"
strPost = "Stage=2&sop=TyreSize&ssq=1&vnp=&vmk=&vch=&vmo=&drd="
' Return the Document body results
HTMLDoc.body.innerHTML = fnPostXmlHttp(strUrl, strPost)
rsIn.MoveNext
Loop
End Sub
Public Function fnPostXmlHttp(ByVal strUrl As String, ByVal strScript As String)
Dim XMLHttpRequest As Object
Dim strOut As String
Set XMLHttpRequest = CreateObject("MSXML2.XMLHTTP")
XMLHttpRequest.Open "POST", strUrl, False
XMLHttpRequest.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
XMLHttpRequest.send (strScript)
While XMLHttpRequest.ReadyState <> 4
DoEvents
Wend
fnPostXmlHttp = XMLHttpRequest.responseText
End Function
答案 0 :(得分:1)
如果您查看www.blackcircles.com HTML回复,您会看到
行JsonObject = {...};
实际上该行代表一个JSON对象,其中包含网页数据上显示的所有内容。因此,您可以通过拆分,解析它,例如转换为数组并输出到工作表,从HTML内容中提取该JSON字符串,如下面的示例代码所示。 将JSON.bas模块导入VBA项目以进行JSON处理。
Option Explicit
Sub Test_blackcircles()
Dim sResp As String
Dim vJSON As Variant
Dim sState As String
Dim i As Long
Dim vItem
Dim aData()
Dim aHeader()
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.blackcircles.com/order/tyres/search?width=205&profile=55&rim=R16&speed=V&vehicle-make=&postcode=&delivery=1&findTyre=", False
.send
sResp = .responseText
End With
sResp = Split(sResp, "JsonObject = {", 2)(1)
sResp = Split(sResp, "};", 2)(0)
sResp = "{" & sResp & "}"
JSON.Parse sResp, vJSON, sState
i = 1
With Sheets(1)
.Cells.Delete
.Cells.WrapText = False
For Each vItem In Array( _
"Manufacturers", _
"CarManufacturers", _
"All", _
"Deals", _
"Best", _
"Rest", _
"SearchParams" _
)
.Cells(i, 1).Value = vItem
JSON.ToArray vJSON(vItem), aData, aHeader
OutputArray .Cells(i + 2, 1), aHeader
Output2DArray .Cells(i + 3, 1), aData
.Columns.AutoFit
i = i + UBound(aData, 1) + 5
Next
End With
End Sub
Sub OutputArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub
Sub Output2DArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub