我已经在vba中编写了一个脚本,它可以非常顺利地从某个站点抓取数据。我试图以非传统的方式做同样的事情。我在我的脚本中使用的循环一直在继续。我需要在这里应用一个逻辑,以便当" Y"是没有它会停止滚动。 Y的值是这里的数字。我可以使用for循环来实现这一点,但我尝试这种方式来确保我是否想要运行我的爬虫而不知道有多少页面可以爬行。提前谢谢。
Sub Aoty_Data()
Dim http As New XMLHTTP60
Dim html As New HTMLDocument, topic As HTMLHtmlElement
y = 1
Do
With http
.Open "GET", "http://www.albumoftheyear.org/ratings/6-highest-rated/2000/" & y, False
.send
html.body.innerHTML = .responseText
End With
For Each topic In html.getElementsByClassName("albumListRow")
x = x + 1
With topic.getElementsByClassName("listLargeTitle")(0).getElementsByTagName("a")
If .Length Then Cells(x, 1) = Split(.Item(0).innerText, "-")(0)
End With
With topic.getElementsByClassName("listLargeTitle")(0).getElementsByTagName("a")
If .Length Then Cells(x, 2) = Split(.Item(0).innerText, "-")(1)
End With
Next topic
y = y + 1
Loop Until y = "" 'I used y="" cause the editor did not let me leave it blank.
End Sub
答案 0 :(得分:1)
您是否想出了自己的解决方案?一个答案是在另一个问题中:当GET路由超过有效页数时,这个“某个站点”如何表现......?它似乎返回结果的第一页。请注意,我根本没有更改现有代码,只是重新考虑了一点,并添加了第二次是否第一次返回艺术家/专辑的测试。
Sub Aoty_Data()
Dim http As New XMLHTTP60
Dim html As New HTMLDocument, topic As HTMLHtmlElement
y = 1
numberOneAlbumForYear = ""
Do
http.Open "GET", "http://www.albumoftheyear.org/ratings/6-highest-rated/2000/" & y, False
http.send
html.body.innerHTML = http.responseText
For Each topic In html.getElementsByClassName("albumListRow")
x = x + 1
With topic.getElementsByClassName("listLargeTitle")(0).getElementsByTagName("a")
Debug.Print .Item(0).innerText
If .Length Then
Cells(x, 1) = Split(.Item(0).innerText, "-")(0)
Cells(x, 2) = Split(.Item(0).innerText, "-")(1)
End If
End With
If y = 1 And numberOneAlbumForYear = "" Then
numberOneAlbumForYear = Cells(x, 1) & Cells(x, 2)
ElseIf (Cells(x, 1) & Cells(x, 2)) = numberOneAlbumForYear Then
Rows(x).ClearContents
Exit Do
End If
Next topic
y = y + 1
Loop 'Until y = "" [don't need this condition at all].
End Sub