我一直在尝试从一个Web站点抓取数据,但是它总是给我一个错误下标超出范围我不知道为什么,我对另一个Web站点使用了完全相同的代码,效果很好。
是的,我已经更改了我要从中删除的新网站上的divs
。
有人可以帮我吗?预先感谢。
Option Explicit
Public Sub Loiça()
Dim data As Object, i As Long, html As HTMLDocument, r As Long, c As Long, item As Object, div As Object
Set html = New HTMLDocument '<== VBE > Tools > References > Microsoft HTML Object Library
Const START_URL As String = "https://mediamarkt.pt/pages/search-results-page?q=maquina+roupa&page=1"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", START_URL, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
Dim numPages As Long, numResults As Long, arr() As String
arr = Split(html.querySelector(".snize-search-results-header").innerText, Chr$(32))
numResults = arr(UBound(arr))
numPages = 1
For i = 1 To numPages
If i > 1 Then
.Open "GET", Replace$("https://mediamarkt.pt/pages/search-results-page?q=maquina+roupa&page=1", "page=1", "page=" & i), False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End If
Set data = html.getElementsByClassName("snize-four-columns")
For Each item In data
r = r + 1: c = 1
For Each div In item.getElementsByTagName("div")
With ThisWorkbook.Worksheets("Loiça")
.Cells(r, c) = div.innerText
End With
c = c + 1
Next
Next
Next
End With
'----------------------------------------------------------------------------------------------------------------------------------------------------------------------'
End Sub
答案 0 :(得分:3)
This
html.querySelector(".snize-search-results-header").innerText
is returning an empty string, therefore when you do split you end up with a -1 in your arr.
It may be that this value requires javascript to run on page. Inspect the returned html. I think no value is returned for that. Use a method like selenium or IE which allows js to run on the page and update content with the value
In this case you also need the lbound so you could use a function that returns the value which uses IE
numPages = GetNumberOfPages
Public Function GetNumberOfPages() As Long
Dim IE As New InternetExplorer
With IE
.Visible = False
.Navigate2 "https://mediamarkt.pt/pages/search-results-page?q=maquina+roupa&page=1"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim numPages As Long, numResults As Long, arr() As String
arr = Split(.document.querySelector(".snize-search-results-header").innerText, Chr$(32))
numResults = arr(LBound(arr))
GetNumberOfPages = numResults
.Quit
End With
End Function
For the next page you are looking at a different class name (I think)
Set data = html.getElementsByClassName("snize-product")
Inspect the html to verify.