使用“点击”按钮进行网页抓取

时间:2020-06-01 14:32:51

标签: html excel vba web-scraping

团队,我试图单击“加载更多”按钮,我只需单击一下就可以单击并运行宏。那是一次。 我需要以下几点帮助

  1. 我正在尝试自动执行代码以重复单击按钮,直到页面加载用于Web抓取的所有数据为止。

  2. 另外,在将数据整理为Excel之前,我需要代码检查网页中是否存在更多加载按钮。如果找不到“加载更多”按钮,则继续执行下一个代码。 (FYI加载更多内容位于我的网页底部)。

谢谢,如果我的问题不清楚,请回复我。

下面是点击“加载更多”按钮之前的HTML代码

<button type="button" class="btn primary btn-primary modal-button-print add-notes" data-bind="click: getNotes, visible: isLoadMoreButtonEnable() &amp;&amp; !$root.providerShouldAcceptDecline()">
  <i class="fa fa-refresh" aria-hidden="true"></i>Load More
</button>    

以下是多次单击“加载更多”按钮直到加载完整数据之后的HTML代码

<button class="btn primary btn-primary modal-button-print add-notes" style="display: none;" type="button" data-bind="click: getWoNotes, visible: isLoadMoreNotesButtonEnable() &amp;&amp; !$root.providerShouldAcceptDecline()">
                <i class="fa fa-refresh" aria-hidden="true"></i>Load More Work Order Notes
            </button>

我从上面的html代码中看到的区别是 style =“ display:none;” 是在我多次单击按钮直到将全部数据加载到网页中之后添加的。

我有一个示例网站,外观类似于my网页。 我在此处使用此链接只是为了显示页面如何在我的网站中加载。

Sub abc()


Set IE = New InternetExplorer
Link = my url
.
.
.
.
For L = 2 To Lr1
    IE.navigate Link 
    Set Html = New MSHTML.HTMLDocument
    Set Ws = Scraping
    Do
    DoEvents: Loop Until IE.readyState = READYSTATE_COMPLETE
    Application.Wait (Now + TimeValue("00:00:05"))
    IE.document.querySelector("button[type=button]").Click
    Do
    DoEvents: Loop Until IE.readyState = READYSTATE_COMPLETE
    Application.Wait (Now + TimeValue("00:00:05"))
    IE.document.querySelector("button[type=button]").Click
    Do
    DoEvents: Loop Until IE.readyState = READYSTATE_COMPLETE
    Application.Wait (Now + TimeValue("00:00:05"))
    IE.document.querySelector("button[type=button]").Click
    Do
    DoEvents: Loop Until IE.readyState = READYSTATE_COMPLETE
    Application.Wait (Now + TimeValue("00:00:05"))

    Html.body.innerHTML = IE.document.querySelectorAll(".list").Item(1).outerHTML
    Set Tariku = Html.querySelectorAll(".columns")
    Set data = Html.querySelectorAll(".datalist")
        With Ws

        ' Do all the stuff  

        End With
        IE.document.querySelector("#Logout").Click
        IE.Quit
       Exit Sub

  Next L

End Sub

1 个答案:

答案 0 :(得分:1)

您可以尝试一下。如果URL不起作用,是否可以发布该URL?

Sub Abc()

Dim browser As Object
Dim url As String
Dim nodeButton As Object
Dim noButtonFound As Boolean

  url = "Your URL here"

  'Initialize Internet Explorer, set visibility,
  'call URL and wait until page is fully loaded
  Set browser = CreateObject("internetexplorer.application")
  browser.Visible = False
  browser.navigate url
  Do Until browser.ReadyState = 4: DoEvents: Loop

  'Click button as often as found
  Do
    'Try to catch button
    Set nodeButton = browser.document.getElementsByTagName("button")(0)

    'Check if button was found
    If Not nodeButton Is Nothing Then
      'Check if it has an style attribute
      If nodeButton.hasAttribute("style") Then
        'Check if button is visible
        If nodeButton.getAttribute("style") <> "display: none;" Then
          'Click button
          nodeButton.Click

          'Wait for load more data
          Application.Wait (Now + TimeSerial(0, 0, 5))
        End If
          'No visible button found, leave loop
          noButtonFound = True
        End If
      Else
        'No visible button found, leave loop
        noButtonFound = True
      End If
    Else
      'No visible button found, leave loop
      noButtonFound = True
    End If
  Loop Until noButtonFound

  'All dynamic data was load
  'Do here what ever you want
  'But I think you don't need a new html document
End Sub