用于从URL中提取数据的Excel VBA源代码

时间:2015-02-21 07:24:08

标签: excel-vba vba excel

我想提取" http://pib.nic.in/newsite/erelease.aspx?relid=58313"上显示的每个新闻项目的标题。使用Excel VBA的网站。我使用getelementsbyclassname编写了一个代码(" contentdiv")。但调试器显示的错误与对象不支持有关...我想提取URL中每个relid ...的信息项...

1 个答案:

答案 0 :(得分:0)

这样的冷擦通常通过XMLHTTP拉动更有效地处理。这需要在VBE的工具►参考中添加一些库。以下代码需要 Microsoft XML,v6.0 Microsoft HTML对象库 Microsoft Internet Controls 。可能不需要最后一个,但如果您将代码扩展到提供的代码之外,您可能会这样做。

Public Const csURL As String = "http://pib.nic.in/newsite/erelease.aspx?relid=×ID×"

Sub scrape_PIBNIC()
    Dim htmlBDY As HTMLDocument, xmlHTTP As MSXML2.ServerXMLHTTP60
    Dim i As Long, u As String, iDIV As Long

    On Error GoTo CleanUp

    Set xmlHTTP = New MSXML2.ServerXMLHTTP60
    Set htmlBDY = New HTMLDocument

    For i = 58313 To 58313
        htmlBDY.body.innerHTML = vbNullString
        With xmlHTTP
            u = Replace(csURL, "×ID×", i)
            'Debug.Print u
            .Open "GET", u, False
            .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
            .send
            If .Status <> 200 Then GoTo CleanUp

            htmlBDY.body.innerHTML = .responseText

            For iDIV = 0 To (htmlBDY.getElementsByClassName("contentdiv").Length - 1)
                If CBool(htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span").Length) Then
                    Sheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0) = _
                      htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span")(0).innerText
                End If
            Next iDIV

        End With
    Next i

CleanUp:
    Set htmlBDY = Nothing
    Set xmlHTTP = Nothing
End Sub

这应该足以让你入门。您定位的网站要求将charset=UTF-8添加到请求中。没有它我没有成功。我强烈怀疑这可能是您object doesn't support错误的来源。