使用excel VBA从运行scrpits的网页中获取数据以显示表数据

时间:2019-03-01 19:15:04

标签: html excel vba internet-explorer web-scraping

研究第二天。我只是不明白。该网页是公开的: https://register.fca.org.uk/ShPo_FirmDetailsPage?id=001b000000MfF1EAAV 手动将pgdn x 2移至[+]个按钮,单击它,然后将pgdn x 1移至“每页结果”下拉列表并将其更改为500。然后将结果复制并粘贴到excel

这是我在网站上找到的代码“从Web(VBA)插入数据时选择下拉列表”,由QHarr回答,我试图适应但失败了。我将“ HELP”放在我认为应该进行更改的位置,但我只是在猜测

*s

因此,我已经将您的更改包括在内,并且在这里。

Public Sub MakeSelectiongGetData()
Dim IE As New InternetExplorer
Const URL = "https://register.fca.org.uk/ShPo_FirmDetailsPage?id=001b000000Mfe5TAAR#ShPo_FirmDetailsPage"
'Const optionText As String = "RDVT11"
Application.ScreenUpdating = False
With IE
    .Visible = True
    .navigate URL

    While .Busy Or .readyState < 4: DoEvents: Wend

    Dim a As Object
    Set a = .document.getElementById("HELP")

    Dim currentOption As Object
    For Each currentOption In a.getElementsByTagName("HELP")
        If InStr(currentOption.innerText, optionText) > 0 Then
            currentOption.Selected = "HELP"
            Exit For
        End If
    Next currentOption
    .document.getElementById("HELP").Click
    While .Busy Or .readyState < 4: DoEvents: Wend

    Dim nTable As HTMLTable

    Do: On Error Resume Next: Set nTable = .document.getElementById("HELP"): On Error GoTo 0: DoEvents: Loop While nTable Is Nothing

    Dim nRow As Object, nCell As Object, r As Long, c As Long

    With ActiveSheet
        Dim nBody As Object
        Set nBody = nTable.getElementsByTagName("tbody")(0).getElementsByTagName("tr")
        .Cells(1, 1) = nBody(0).innerText
        For r = 2 To nBody.Length - 1
            Set nRow = nBody(r)
            For Each nCell In nRow.Cells
                c = c + 1: .Cells(r + 1, c) = nCell.innerText
            Next nCell
            c = 0
      Next r
End With
.Quit
End With
Application.ScreenUpdating = True
End Sub

1 个答案:

答案 0 :(得分:0)

您可以使用css attribute = value选择器为个人指定+,也可以选择500作为选项

 Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub MakeSelections()
    Dim IE As New InternetExplorer
    With IE
        .Visible = True
        .Navigate2 "https://register.fca.org.uk/ShPo_FirmDetailsPage?id=001b000000MfF1EAAV"

        While .Busy Or .readyState < 4: DoEvents: Wend

        .document.querySelector("[href*=FirmIndiv]").Click '<==click the + for indiv
        .document.querySelector("#IndividualSearchResults_length [value='500']").Selected = True

        Dim event_onchange As Object
        Set event_onchange = .document.createEvent("HTMLEvents")
        event_onchange.initEvent "change", True, False

        .document.querySelector("[name=IndividualSearchResults_length]").dispatchEvent event_onchange

        Application.Wait Now + TimeSerial(0, 0, 5)
        Dim clipboard As Object, ws As Worksheet

        Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
        Set ws = ThisWorkbook.Worksheets("Sheet1")
        clipboard.SetText .document.querySelector("#IndividualSearchResults").outerHTML
        clipboard.PutInClipboard
        ws.Cells(1, 1).PasteSpecial
        .Quit
    End With
End Sub

此选择器[href*=FirmIndiv]是一个属性=值选择器,带有包含(*)修饰符。它为href值中包含子字符串FirmIndiv的{​​{1}}属性寻找匹配项。 href *(即Document)的querySelector all方法将返回找到的第一个匹配项。

您可以在这里查看比赛:

HTMLDocument标签元素的选择器(结果计数的父option标签包含子select标签元素)

option

它使用id (#) selector以其ID值#IndividualSearchResults_length [value='500'] 来定位父select元素的div父对象,然后使用descendant combinator(“”)通过attribute =值选择器来指定IndividualSearchResults_length = option的{​​{1}}元素。

您可以在这里看到

enter image description here


硒基本版本:

value