从网址不变的网页源中抓取数据

时间:2018-10-03 01:09:09

标签: url web-scraping access-vba

我需要执行以下操作

我有2个问题

  1. 我不知道如何选择“ 专科医院”和“ 所有门诊护理机构**注2
  2. 当我手动选择这两种类型,然后单击某些医院时,URL并不会变成特定于选择的内容。 选择两种类型后,它将变为http://healthapps.state.nj.us/facilities/acFacilityList.aspx,然后在单击医院时保持这种状态。 因此,我无法编写将抓取这些页面的代码,因为我不知道如何为每个医院指定URL。

很抱歉,这必须是一个非常基本的问题,但是我无法在Google上用它访问Access VBA的有用信息

这是从页面提取数据的代码,我还没有执行循环,所以这只是页面后面的源数据的基本提取

Public Function btnGetWebData_Click() 
    Dim strURL
    Dim HTML_Content As HTMLDocument
    Dim dados As Object

    'Create HTMLFile Object
    Set HTML_Content = New HTMLDocument

    'Get the WebPage Content to HTMLFile Object
    With CreateObject("msxml2.xmlhttp")
        .Open "GET", "http://healthapps.state.nj.us/facilities/acFacilityList.aspx", False
        'http://healthapps.state.nj.us/facilities/acFacilityList.aspx
        .Send
        HTML_Content.Body.innerHTML = .responseText
        Debug.Print .responseText
        Debug.Print HTML_Content.Body.innerHTML
    End With
End Function

1 个答案:

答案 0 :(得分:1)

它导航到每个结果页面,然后在两者之间返回首页,以便通过点击来利用postBack链接。

Option Explicit
Public Sub VisitPages()
    Dim IE As New InternetExplorer
    With IE
        .Visible = True
        .navigate "http://healthapps.state.nj.us/facilities/acSetSearch.aspx?by=county"

        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document
            .querySelector("#middleContent_cbType_5").Click
            .querySelector("#middleContent_cbType_12").Click
            .querySelector("#middleContent_btnGetList").Click
        End With

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim list As Object, i  As Long
        Set list = .document.querySelectorAll("#main_table [href*=doPostBack]")
        For i = 0 To list.Length - 1
            list.item(i).Click

            While .Busy Or .readyState < 4: DoEvents: Wend

            Application.Wait Now + TimeSerial(0, 0, 3) '<== Delete me later. This is just to demo page changes
            'do stuff with new page
            .Navigate2 .document.URL             '<== back to homepage
            While .Busy Or .readyState < 4: DoEvents: Wend
            Set list = .document.querySelectorAll("#main_table [href*=doPostBack]") 'reset list (often required in these scenarios)
        Next
        Stop                                     '<== Delete me later
        '.Quit '<== Remember to quit application
    End With
End Sub

与执行postBacks相同

Option Explicit
Public Sub VisitPages()
    Dim IE As New InternetExplorer
    With IE
        .Visible = True
        .navigate "http://healthapps.state.nj.us/facilities/acSetSearch.aspx?by=county"

        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document
            .querySelector("#middleContent_cbType_5").Click
            .querySelector("#middleContent_cbType_12").Click
            .querySelector("#middleContent_btnGetList").Click
        End With

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim list As Object, i  As Long, col As Collection
        Set col = New Collection
        Set list = .document.querySelectorAll("#main_table [href*=doPostBack]")
        For i = 0 To list.Length - 1
           col.Add CStr(list.item(i))
        Next
        For i = 1 To col.Count
            .document.parentWindow.execScript col.item(i)
             While .Busy Or .readyState < 4: DoEvents: Wend
            'Do stuff with page
            .Navigate2 .document.URL
            While .Busy Or .readyState < 4: DoEvents: Wend
        Next
        Stop                                     '<== Delete me later
        '.Quit '<== Remember to quit application
    End With
End Sub