在下面的代码中,我们使用IE Automation从这里获取
"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & Ticker & "&type=10-Q&dateb=&owner=exclude&count=20"
到这样的位置
https://www.sec.gov/Archives/edgar/data/10795/000119312514042815/bdx-20131231.xml
有没有办法在不使用IE Automation的情况下从位置1转到位置2并找到更可靠,更安全,更快速的东西?
出于完整性的原因,这里是我们现在拥有的完整代码;通过运行你将看到IE的大量使用:
Option Explicit
Sub MadMule2()
Dim IE As InternetExplorer
Dim el
Dim els
Dim colDocLinks As New Collection
Dim Ticker As String
Dim lnk
Dim intCounter as Integer
Set IE = New InternetExplorer
IE.Visible = False
Ticker = Worksheets("Sheet1").Range("A1").Value
LoadPage IE, "https://www.sec.gov/cgi-bin/browse-edgar?" & _
"action=getcompany&CIK=" & Ticker & "&type=10-Q" & _
"&dateb=&owner=exclude&count=20"
Set els = IE.document.getElementsByTagName("a")
For Each el In els
If Trim(el.innerText) = "Documents" Then
colDocLinks.Add el.href
End If
Next el
intCounter = 1
For Each lnk In colDocLinks
LoadPage IE, CStr(lnk)
For Each el In IE.document.getElementsByTagName("a")
If el.href Like "*[0-9].xml" Then
ActiveWorkbook.XmlMaps.Add(el, "xbrl").Name = "xbrl Map"
End If
Next el
Next lnk
End Sub
Sub LoadPage(IE As InternetExplorer, URL As String)
IE.navigate URL
Do While IE.Busy Or IE.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
End Sub
附加
问:Is there a way to go from location 1 to location 2 without using IE Automation and finding something more reliable, secure and faster?
你能扩展一下吗?
通过mehow
答:以下是我们收到的用户2140261 here收到的代码块的评论:
您应该研究MSXML,它比IE自动化更快,更安全,更可靠。
由于代码打开了Internet Explorer,因此解析源页面以查找href并获取所需的Web位置;我们想知道是否有办法在不使用IE的情况下进入位置2。用户2140261可以用MSXML完成吗?
答案 0 :(得分:0)
以下是使用XHR
的示例Option Explicit
Public Sub GetLinks()
Dim ticker As String, html As New HTMLDocument, links As Object, i As Long
ticker = [A1] 'example is 81251
Set html = GetHTMLDocument("https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & ticker)
If html Is Nothing Then Exit Sub
Set links = html.querySelectorAll("#documentsbutton")
If links Is Nothing Then Exit Sub
For i = 0 To links.Length - 1
Debug.Print GetAbsoluteURL(links(i).getAttribute("href"))
Next i
End Sub
Public Function GetAbsoluteURL(ByVal relativeURL As String) As String
If InStr(relativeURL, "about:/") > 0 Then
GetAbsoluteURL = Replace$(relativeURL, "about:/", "https://www.sec.gov/")
Else
GetAbsoluteURL = relativeURL
End If
End Function
Public Function GetHTMLDocument(ByVal URL As String) As HTMLDocument
Dim sResponse As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
If .Status <> 200 Then
Exit Function
Else
sResponse = StrConv(.responseBody, vbUnicode)
End If
End With
Set GetHTMLDocument = New HTMLDocument
GetHTMLDocument.body.innerHTML = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
End Function
样本代码为81251
示例输出:
CSS选择器和.querySelectorAll
您可以使用"#documentsbutton"
中CSS selector定位文档按钮。
现在,虽然#
表示id,并且id通常是唯一的,但对于该特定网站,似乎使用相同的id来标识表中的所有元素。
CSS选择器查询从页面返回以下(示例):
在VBA中应用CSS选择器:
匹配多个元素时,document
的{{3}}方法用于返回nodeList
由CSS选择器匹配的项目:
html.querySelectorAll("#documentsbutton")
我们可以沿nodeList
遍历.Length
,并按索引(沿nodeList的位置;从0开始)访问单个文档button
元素:
For i = 0 To links.Length - 1
需要的超链接信息,您可以简单地从每个匹配的元素中拉href
属性:
links(i).getAttribute("href")
这会返回一个相对路径,因此我们使用一个微小的辅助函数GetAbsoluteURL
来获取绝对路径。
关闭笔记:
虽然没有完全遵守.querySelectorAll
,但这确实显示了重构代码的一些好处。然后,您可以重新使用GetHTMLDocument
函数来处理GetAbsoluteURL
返回的新URL