我在vba中找到了一个脚本,用于从网页中解析每个容器中的两个类别。刮刀能够相应地解析它们。我现在面临的问题是我无法将这些项目放在列中。如果列包含views
,则下一列应包含votes
,依此类推。我期待结果的方式更像是:
column1 column2 column3 column4
9 views 0 vote 10 views -2
到目前为止,这是我的脚本:
Sub CollectInfo()
Const URL As String = "https://stackoverflow.com/questions/tagged/web-scraping"
Dim Http As New XMLHTTP60, Html As New HTMLDocument
Dim post As HTMLHtmlElement, R&, C&
With Http
.Open "GET", URL, False
.send
Html.body.innerHTML = .responseText
End With
R = 1
For Each post In Html.getElementsByClassName("question-summary")
C = C + 1: Cells(R, C) = post.getElementsByClassName("views")(0).innerText
Cells(R, C + 1) = post.getElementsByClassName("votes")(0).innerText
Next post
End Sub
我尝试的方式肯定会导致我错误的放置。如何修复它以达到目的?顺便说一句,我不想去offset
(我的意思是Range("A1").offset(,1)"
)循环;相反,我想坚持我上面尝试的方式。感谢。
答案 0 :(得分:1)
这将轮流显示观点和投票。我将XMLHTTP60
更改为MSXML2.XMLHTTP60
,因为在我的结尾会导致自动化错误。
Sub CollectInfo()
Const URL As String = "https://stackoverflow.com/questions/tagged/web-scraping"
Dim Http As New MSXML2.XMLHTTP60, Html As New HTMLDocument
Dim post As HTMLHtmlElement, R&, C&
With Http
.Open "GET", URL, False
.send
Html.body.innerHTML = .responseText
End With
R = 1
For Each post In Html.getElementsByClassName("question-summary")
C = C + 1
Cells(R, C) = post.getElementsByClassName("views")(0).innerText
C = C + 1
Cells(R, C) = post.getElementsByClassName("votes")(0).innerText
Next post
End Sub