我想提取" http://pib.nic.in/newsite/erelease.aspx?relid=58313"上显示的每个新闻项目的标题。使用Excel VBA的网站。我使用getelementsbyclassname编写了一个代码(" contentdiv")。但调试器显示的错误与对象不支持有关...我想提取URL中每个relid ...的信息项...
答案 0 :(得分:0)
这样的冷擦通常通过XMLHTTP拉动更有效地处理。这需要在VBE的工具►参考中添加一些库。以下代码需要 Microsoft XML,v6.0 , Microsoft HTML对象库和 Microsoft Internet Controls 。可能不需要最后一个,但如果您将代码扩展到提供的代码之外,您可能会这样做。
Public Const csURL As String = "http://pib.nic.in/newsite/erelease.aspx?relid=×ID×"
Sub scrape_PIBNIC()
Dim htmlBDY As HTMLDocument, xmlHTTP As MSXML2.ServerXMLHTTP60
Dim i As Long, u As String, iDIV As Long
On Error GoTo CleanUp
Set xmlHTTP = New MSXML2.ServerXMLHTTP60
Set htmlBDY = New HTMLDocument
For i = 58313 To 58313
htmlBDY.body.innerHTML = vbNullString
With xmlHTTP
u = Replace(csURL, "×ID×", i)
'Debug.Print u
.Open "GET", u, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
.send
If .Status <> 200 Then GoTo CleanUp
htmlBDY.body.innerHTML = .responseText
For iDIV = 0 To (htmlBDY.getElementsByClassName("contentdiv").Length - 1)
If CBool(htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span").Length) Then
Sheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0) = _
htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span")(0).innerText
End If
Next iDIV
End With
Next i
CleanUp:
Set htmlBDY = Nothing
Set xmlHTTP = Nothing
End Sub
这应该足以让你入门。您定位的网站要求将charset=UTF-8
添加到请求中。没有它我没有成功。我强烈怀疑这可能是您object doesn't support
错误的来源。