我正在努力从网站上抓取一些数据。我已经介入了,由于某种原因,Question.className与我所拥有的以及我在网页上的检查器上检查的内容不匹配。当我说匹配时,似乎没有分配任何东西。我刚刚开始研究数据抓取,并会感谢任何提示。我没有包含所有代码,因为下面的代码是错误发生的地方。如果之前有人问过,请提前致谢并道歉,但在搜索后我找不到任何内容,以为我会发帖。
网址为 - https://stackoverflow.com
Range("A3").Value = "Question id" 'put heading across the top of row 3
Range("B3").Value = "Votes"
Range("C3").Value = "Views"
Range("D3").Value = "Person"
Dim QuestionList As IHTMLElement
Dim Questions As IHTMLElementCollection
Dim Question As IHTMLElement
Dim RowNumber As Long
Dim QuestionId As String
Dim QuestionFields As IHTMLElementCollection
Dim QuestionField As IHTMLElement
Dim votes As String
Dim views As String
Dim QuestionFieldLinks As IHTMLElementCollection
Set QuestionList = html.getElementById("question-mini-list")
Set Questions = QuestionList.Children
RowNumber = 4
For Each Question In Questions
'if this is the tag containing the question details, process it
If Question.className = "question-summary narrow" Then
'first get and store the question id in first column
QuestionId = Replace(Question.ID, "question-summary-", "")
Cells(RowNumber, 1).Value = CLng(QuestionId)
'get a list of all of the parts of this question, and loop over them
Set QuestionFields = Question.all
For Each QuestionField In QuestionFields
'if this is the question's votes, store it (get rid of any surrounding text)
If QuestionField.className = "votes" Then
votes = Replace(QuestionField.innerText, "votes", "")
votes = Replace(votes, "vote", "")
Cells(RowNumber, 2).Value = Trim(votes)
End If
'likewise for views (getting rid of any text)
If QuestionField.className = "views" Then
views = QuestionField.innerText
views = Replace(views, "views", "")
views = Replace(views, "view", "")
Cells(RowNumber, 3).Value = Trim(views)
End If
'if this is the bit where author's name is ...
If QuestionField.className = "started" Then
'get a list of all elements within, and store the text in the second one
Set QuestionFieldLinks = QuestionField.all
Cells(RowNumber, 4).Value = QuestionFieldLinks(2).innerHTML
End If
Next QuestionField
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Next
答案 0 :(得分:2)
试试这个。它应该获取所需的字段:
Sub GetInformation()
Dim Http As New XMLHTTP60, Html As New HTMLDocument
Dim post As HTMLDivElement, URL$, R&
URL = "https://stackoverflow.com/"
With Http
.Open "GET", URL, False
.send
Html.body.innerHTML = .responseText
End With
For Each post In Html.getElementsByClassName("question-summary")
R = R + 1: Cells(R, 1) = Split(post.getAttribute("id"), "-")(2)
Cells(R, 2) = Split(post.querySelector(".votes span").getAttribute("title"), " ")(0)
Cells(R, 3) = Split(post.querySelector(".views span").getAttribute("title"), " ")(0)
Next post
End Sub
参考添加到库:
Microsoft XML, v6.0
Microsoft HTML Object Library