位置1

"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & Ticker & "&type=10-Q&dateb=&owner=exclude&count=20"

到这样的位置

位置2

https://www.sec.gov/Archives/edgar/data/10795/000119312514042815/bdx-20131231.xml

有没有办法在不使用IE Automation的情况下从位置1转到位置2并找到更可靠，更安全，更快速的东西？

出于完整性的原因，这里是我们现在拥有的完整代码;通过运行你将看到IE的大量使用：

Option Explicit

Sub MadMule2()
    Dim IE As InternetExplorer
    Dim el
    Dim els
    Dim colDocLinks As New Collection
    Dim Ticker As String
    Dim lnk
    Dim intCounter as Integer    

    Set IE = New InternetExplorer

    IE.Visible = False

    Ticker = Worksheets("Sheet1").Range("A1").Value

    LoadPage IE, "https://www.sec.gov/cgi-bin/browse-edgar?" & _
                  "action=getcompany&CIK=" & Ticker & "&type=10-Q" & _
                  "&dateb=&owner=exclude&count=20"

    Set els = IE.document.getElementsByTagName("a")
    For Each el In els
        If Trim(el.innerText) = "Documents" Then
            colDocLinks.Add el.href
        End If
    Next el

    intCounter = 1

     For Each lnk In colDocLinks
        LoadPage IE, CStr(lnk)
        For Each el In IE.document.getElementsByTagName("a")
            If el.href Like "*[0-9].xml" Then
                ActiveWorkbook.XmlMaps.Add(el, "xbrl").Name = "xbrl Map"
            End If
        Next el
    Next lnk
End Sub

Sub LoadPage(IE As InternetExplorer, URL As String)
    IE.navigate URL
    Do While IE.Busy Or IE.readyState <> READYSTATE_COMPLETE
        DoEvents
    Loop
End Sub

附加

问：Is there a way to go from location 1 to location 2 without using IE Automation and finding something more reliable, secure and faster?

你能扩展一下吗？

通过mehow

答：以下是我们收到的用户2140261 here收到的代码块的评论：

您应该研究MSXML，它比IE自动化更快，更安全，更可靠。

由于代码打开了Internet Explorer，因此解析源页面以查找href并获取所需的Web位置;我们想知道是否有办法在不使用IE的情况下进入位置2。用户2140261可以用MSXML完成吗？

Answer 1

以下是使用XHR

的示例

Option Explicit
Public Sub GetLinks()
    Dim ticker As String, html As New HTMLDocument, links As Object, i As Long
    ticker = [A1]                                'example is 81251
    Set html = GetHTMLDocument("https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & ticker)
    If html Is Nothing Then Exit Sub
    Set links = html.querySelectorAll("#documentsbutton")
    If links Is Nothing Then Exit Sub
    For i = 0 To links.Length - 1
        Debug.Print GetAbsoluteURL(links(i).getAttribute("href"))
    Next i
End Sub

Public Function GetAbsoluteURL(ByVal relativeURL As String) As String
    If InStr(relativeURL, "about:/") > 0 Then
        GetAbsoluteURL = Replace$(relativeURL, "about:/", "https://www.sec.gov/")
    Else
        GetAbsoluteURL = relativeURL
    End If
End Function

Public Function GetHTMLDocument(ByVal URL As String) As HTMLDocument
    Dim sResponse As String
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .send
        If .Status <> 200 Then
            Exit Function
        Else
            sResponse = StrConv(.responseBody, vbUnicode)
        End If
    End With
    Set GetHTMLDocument = New HTMLDocument
    GetHTMLDocument.body.innerHTML = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
End Function

_{样本代码为81251}

示例输出：

CSS选择器和.querySelectorAll

您可以使用"#documentsbutton"中CSS selector定位文档按钮。现在，虽然#表示id，并且id通常是唯一的，但对于该特定网站，似乎使用相同的id来标识表中的所有元素。

CSS选择器查询从页面返回以下（示例）：

在VBA中应用CSS选择器：

匹配多个元素时，document的{{3}}方法用于返回nodeList由CSS选择器匹配的项目：

html.querySelectorAll("#documentsbutton")

我们可以沿nodeList遍历.Length，并按索引（沿nodeList的位置；从0开始）访问单个文档button元素：

For i = 0 To links.Length - 1

需要的超链接信息，您可以简单地从每个匹配的元素中拉href属性：

links(i).getAttribute("href")

这会返回一个相对路径，因此我们使用一个微小的辅助函数GetAbsoluteURL来获取绝对路径。

关闭笔记：

虽然没有完全遵守.querySelectorAll，但这确实显示了重构代码的一些好处。然后，您可以重新使用GetHTMLDocument函数来处理GetAbsoluteURL返回的新URL

VBA获取URL位置而不使用IE Automation解析IE.Document

位置1

位置2

1 个答案: