使用VBA提取XML属性

时间:2013-05-08 14:49:04

标签: xml excel vba excel-vba

我不是开发人员,并且XML知识非常有限,但我过去3-4天在网上进行研究时学到了很多东西。因此,请提前为此问题的基本级别道歉。我正试图完成这一次任务。

我有一些VBA Excel知识,目前我正在尝试使用VBA从SEC文件网站上的给定公司页面中提取SIC代码属性。例如,这是沃尔玛的网站

http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000104169&owner=exclude&count=40&hidefilings=0

在顶部的蓝色栏中你可以看到'SIC:5331'这是5331我正在尝试返回VBA变量,所以我可以填充电子表格。当我右键单击IE并且陈词滥调查看源代码时,相关页面的部分将以XML格式读取:

<div id="contentDiv">
  <!-- START FILER DIV -->
  <div style="margin: 15px 0 10px 0; padding: 3px; overflow: hidden; background-color: #BCD6F8;">
    <div class="mailer">Mailing Address
      <span class="mailerAddress">702 SOUTHWEST 8TH STREET</span>
      <span class="mailerAddress"> BENTONVILLE AR 72716         </span>
    </div>
    <div class="mailer">Business Address
      <span class="mailerAddress">702 SOUTHWEST 8TH ST</span>
      <span class="mailerAddress">BENTONVILLE AR 72716         </span>
      <span class="mailerAddress">5012734000</span>
    </div>
    <div class="companyInfo">
      <span class="companyName">WAL MART STORES INC <acronym title="Central Index Key">CIK</acronym>#: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;CIK=0000104169&amp;owner=exclude&amp;count=40">0000104169 (see all company filings)</a></span>
      <p class="identInfo"><acronym title="Standard Industrial Code">SIC</acronym>: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;SIC=5331&amp;owner=exclude&amp;count=40">5331</a> - RETAIL-VARIETY STORES<br />State location: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;State=AR&amp;owner=exclude&amp;count=40">AR</a> | State of Inc.: <strong>DE</strong> | Fiscal Year End: 0131<br />(Assistant Director Office: 2)<br />Get <a href="/cgi-bin/own-disp?action=getissuer&amp;CIK=0000104169"><b>insider transactions</b></a> for this <b> issuer</b>.
        <br />Get <a href="/cgi-bin/own-disp?action=getowner&amp;CIK=0000104169"><b>insider transactions</b></a> for this <b>reporting owner</b>.
      </p>
    </div>
  </div>
</div>

在尝试了解如何使用VBA提取SIC时,我在您的网站上发现了以下帖子:

Query and parse xml attribute value into XLS using VBA

我尝试通过复制/粘贴将barrowc的答案应用到Excel模块并插入Wal Mart文件的路径但是当我单步执行时,我得到Debug.Print“*****”但我没有得到n.Text的任何东西。

Sub test4()
    Dim d As MSXML2.DOMDocument60
    Dim i As IXMLDOMNodeList
    Dim n As IXMLDOMNode

    Set d = New MSXML2.DOMDocument60
    d.async = False
    d.Load ("http://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=886475&filenum=&State=&Country=&SIC=&owner=exclude&Find=Find+Companies&action=getcompany")

    Debug.Print "*****"
    Set i = d.SelectNodes("//div[@id='contentDiv']")
    For Each n In i
        Debug.Print n.Text
    Next n
    Debug.Print "*****"

    Set d = Nothing
End Sub

我在d.SelectNodes()尝试了各种各样的字符串,但是我对这个主题了解不足以了解我哪里出错了。因此,对我的语法或指向资源的指针的评论将非常有用。

2 个答案:

答案 0 :(得分:1)

如果您只对SIC感兴趣,那么尝试解析整个DOM结构是不值得的。相反,识别一组唯一的字符,搜索它然后从那里提取SIC。

以下功能就是这样做的。您只需要将页面的完整HTML源传递给它,它将返回SIC:

Function ExtractSIC(SourceHtml As String) As String
    Const PrefixChars As String = "&amp;SIC="
    Const SuffixChars As String = "&"
    Dim StartPos As Long, EndPos As Long
    StartPos = InStr(SourceHtml, PrefixChars)
    If StartPos = 0 Then Exit Function

    StartPos = StartPos + Len(PrefixChars)
    EndPos = InStr(StartPos, SourceHtml, SuffixChars) - 1
    ExtractSIC = Mid(SourceHtml, StartPos, EndPos - StartPos + 1)
End Function

答案 1 :(得分:0)

再次感谢mwolfe。我在下面发布了我的代码,但你提供的内容更优雅。我知道SIC只有4位数,所以我很懒,并且在代码中做了一个假设,这可能会在将来引发错误。您可以在注释掉的部分看到我是如何做到的。

Sub GetSICs()
    Application.ScreenUpdating = False

    Dim AWBN As String
    Dim ASN As String
    Dim CIK As String
    Dim NUM_FILES_TO_GET As Long
    Dim COUNTER As Long
    Dim SICTagPos As Integer
    Dim SIC As String

    Set IEbrowser = CreateObject("InternetExplorer.application")
    IEbrowser.Visible = False
    AWBN = ActiveWorkbook.Name
    ASN = ActiveSheet.Name
    Workbooks(AWBN).Sheets(ASN).Range("A1").Select
    ActiveCell.Offset(0, 11) = "SIC"
    NUM_FILES_TO_GET = Application.WorksheetFunction.CountA(Range("A:A"))
    For COUNTER = 1 To 3 'NUM_FILES_TO_GET
        Application.StatusBar = "Counter = " & COUNTER
        'SICTagPos = 0
        CIK = ActiveCell.Offset(COUNTER, 2)
        IEbrowser.Navigate URL:="http://www.sec.gov/edgar/searchedgar/companysearch.html"
        Do
            DoEvents
        Loop Until IEbrowser.readyState = 4
        Set frm = IEbrowser.Document.forms(0)
        frm("CIK").Value = CIK
        frm.submit
        While IEbrowser.Busy Or IEbrowser.readyState <> 4: DoEvents: Wend
        SIC = ExtractSIC(IEbrowser.Document.body.innerhtml)
        'SICTagPos = InStr(1, IEbrowser.Document.body.innerhtml, "SIC=")
        'SIC = Right(Left(IEbrowser.Document.body.innerhtml, SICTagPos + 7), 4)
        ActiveCell.Offset(COUNTER, 11).NumberFormat = "@"
        ActiveCell.Offset(COUNTER, 11) = SIC

    Next

    Application.StatusBar = False
    Application.ScreenUpdating = True

End Sub


Function ExtractSIC(SourceHtml As String) As String
    Const PrefixChars As String = "&amp;SIC="
    Const SuffixChars As String = "&"
    Dim StartPos As Long, EndPos As Long
    StartPos = InStr(SourceHtml, PrefixChars)
    If StartPos = 0 Then Exit Function

    StartPos = StartPos + Len(PrefixChars)
    EndPos = InStr(StartPos, SourceHtml, SuffixChars) - 1
    ExtractSIC = Mid(SourceHtml, StartPos, EndPos - StartPos + 1)
End Function