VBA href抓取浏览器的源代码

时间:2014-02-14 18:37:50

标签: excel vba excel-vba web

我确实更新了我的问题,因为我更清楚地了解了我想要解决的技术问题。

一个。如果我们从数据代理商网站上的搜索中获取结果网址,我们就会收到此

    https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000010795&type=10-K&dateb=&owner=exclude&count=20

B中。通过在浏览器中输入步骤A的URL并转到我们在第100行(我使用谷歌浏览器)看到的源代码,这条迷人的行也是一个可点击的链接:

    href="/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm"

该行包含在源代码的 的代码段

    <tr>
<td nowrap="nowrap">10-K</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm" id="documentsbutton">&nbsp;Documents</a>&nbsp; <a href="/cgi-bin/viewer?action=view&amp;cik=10795&amp;accession_number=0001193125-13-456802&amp;xbrl_type=v" id="interactiveDataBtn">&nbsp;Interactive Data</a></td>
<td class="small" >Annual report [Section 13 and 15(d), not S-K Item 405]<br />Acc-no: 0001193125-13-456802&nbsp;(34 Act)&nbsp; Size: 15 MB            </td>
            <td>2013-11-27</td>
            <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=001-04802&amp;owner=exclude&amp;count=20">001-04802</a><br>131247478         </td>
         </tr>

℃。如果我们点击第100行的步骤A的链接,我们转到下一页,步骤A的链接现在成为URL的一部分!所以我们得到的是分配给此URL的新页面:

https://www.sec.gov/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm

d。使用相同的方法,我们在第182行中遇到了这行代码

href="/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

如果我们单击该行,我们将转到下面宏中的strXMLSite。一旦你看了宏并运行它,你就会明白,如果我们可以将相关的程序集成到我们的宏中,那么String 可以在运行时填充所需的URL,这是一个合乎逻辑的结论。 。这是问题的核心。


我们已激活此过程所需的宏Microsoft XML Core Services (MSXML)(Excel - &gt; VBE - &gt;工具 - &gt;参考 - &gt; Microsoft XML,v6.0)所需。 / p>

我们如何通过向程序添加语句,将步骤A 上的URL(通过源代码)的VBA抓取到现在位于strXMLSite字符串上的URL?我们是否需要从工具中激活库 - &gt;参考文献?你能用这种方法给我看一个代码块吗?这一点的方法是什么?

出于完整性原因,请允许我提供@ user2140261

的宏观礼貌
Sub GetNode()
Dim strXMLSite As String
Dim objXMLHTTP As MSXML2.XMLHTTP
Dim objXMLDoc As MSXML2.DOMDocument
Dim objXMLNodexbrl As MSXML2.IXMLDOMNode
Dim objXMLNodeDIIRSP As MSXML2.IXMLDOMNode

Set objXMLHTTP = New MSXML2.XMLHTTP
Set objXMLDoc = New MSXML2.DOMDocument

strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

objXMLHTTP.Open "POST", strXMLSite, False
objXMLHTTP.send
objXMLDoc.LoadXML (objXMLHTTP.responseText)

Set objXMLNodexbrl = objXMLDoc.SelectSingleNode("xbrl")

Set objXMLNodeDIIRSP = objXMLNodexbrl.SelectSingleNode("us-gaap:DebtInstrumentInterestRateStatedPercentage")

Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub

感谢您观看我的问题

1 个答案:

答案 0 :(得分:5)

添加对“Microsoft Internet控件”的引用。这将使您获得单独的xml链接。

Sub Tester()

    Dim IE As New InternetExplorer
    Dim els, el, colDocLinks As New Collection
    Dim lnk

    IE.Visible = True
    Loadpage IE, "https://www.sec.gov/cgi-bin/browse-edgar?" & _
                  "action=getcompany&CIK=0000010795&type=10-K" & _
                  "&dateb=&owner=exclude&count=20"

    'collect all the "Document" links on the page
    Set els = IE.Document.getelementsbytagname("a")
    For Each el In els
        If Trim(el.innerText) = "Documents" Then
            'Debug.Print el.innerText, el.href
            colDocLinks.Add el.href
        End If
    Next el

    'loop through the "document" links and check each page for xml links
    For Each lnk In colDocLinks
        Loadpage IE, CStr(lnk)
        For Each el In IE.Document.getelementsbytagname("a")
            If el.href Like "*.xml" Then
                Debug.Print el.innerText, el.href
                'work with the document from this link
            End If
        Next el
    Next lnk

End Sub

Sub Loadpage(IE As Object, URL As String)
    IE.navigate URL
    Do While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE
        DoEvents
    Loop
End Sub