我不是开发人员,并且XML知识非常有限,但我过去3-4天在网上进行研究时学到了很多东西。因此,请提前为此问题的基本级别道歉。我正试图完成这一次任务。
我有一些VBA Excel知识,目前我正在尝试使用VBA从SEC文件网站上的给定公司页面中提取SIC代码属性。例如,这是沃尔玛的网站
在顶部的蓝色栏中你可以看到'SIC:5331'这是5331我正在尝试返回VBA变量,所以我可以填充电子表格。当我右键单击IE并且陈词滥调查看源代码时,相关页面的部分将以XML格式读取:
<div id="contentDiv">
<!-- START FILER DIV -->
<div style="margin: 15px 0 10px 0; padding: 3px; overflow: hidden; background-color: #BCD6F8;">
<div class="mailer">Mailing Address
<span class="mailerAddress">702 SOUTHWEST 8TH STREET</span>
<span class="mailerAddress"> BENTONVILLE AR 72716 </span>
</div>
<div class="mailer">Business Address
<span class="mailerAddress">702 SOUTHWEST 8TH ST</span>
<span class="mailerAddress">BENTONVILLE AR 72716 </span>
<span class="mailerAddress">5012734000</span>
</div>
<div class="companyInfo">
<span class="companyName">WAL MART STORES INC <acronym title="Central Index Key">CIK</acronym>#: <a href="/cgi-bin/browse-edgar?action=getcompany&CIK=0000104169&owner=exclude&count=40">0000104169 (see all company filings)</a></span>
<p class="identInfo"><acronym title="Standard Industrial Code">SIC</acronym>: <a href="/cgi-bin/browse-edgar?action=getcompany&SIC=5331&owner=exclude&count=40">5331</a> - RETAIL-VARIETY STORES<br />State location: <a href="/cgi-bin/browse-edgar?action=getcompany&State=AR&owner=exclude&count=40">AR</a> | State of Inc.: <strong>DE</strong> | Fiscal Year End: 0131<br />(Assistant Director Office: 2)<br />Get <a href="/cgi-bin/own-disp?action=getissuer&CIK=0000104169"><b>insider transactions</b></a> for this <b> issuer</b>.
<br />Get <a href="/cgi-bin/own-disp?action=getowner&CIK=0000104169"><b>insider transactions</b></a> for this <b>reporting owner</b>.
</p>
</div>
</div>
</div>
在尝试了解如何使用VBA提取SIC时,我在您的网站上发现了以下帖子:
Query and parse xml attribute value into XLS using VBA
我尝试通过复制/粘贴将barrowc的答案应用到Excel模块并插入Wal Mart文件的路径但是当我单步执行时,我得到Debug.Print“*****”但我没有得到n.Text的任何东西。
Sub test4()
Dim d As MSXML2.DOMDocument60
Dim i As IXMLDOMNodeList
Dim n As IXMLDOMNode
Set d = New MSXML2.DOMDocument60
d.async = False
d.Load ("http://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=886475&filenum=&State=&Country=&SIC=&owner=exclude&Find=Find+Companies&action=getcompany")
Debug.Print "*****"
Set i = d.SelectNodes("//div[@id='contentDiv']")
For Each n In i
Debug.Print n.Text
Next n
Debug.Print "*****"
Set d = Nothing
End Sub
我在d.SelectNodes()
尝试了各种各样的字符串,但是我对这个主题了解不足以了解我哪里出错了。因此,对我的语法或指向资源的指针的评论将非常有用。
答案 0 :(得分:1)
如果您只对SIC感兴趣,那么尝试解析整个DOM结构是不值得的。相反,识别一组唯一的字符,搜索它然后从那里提取SIC。
以下功能就是这样做的。您只需要将页面的完整HTML源传递给它,它将返回SIC:
Function ExtractSIC(SourceHtml As String) As String
Const PrefixChars As String = "&SIC="
Const SuffixChars As String = "&"
Dim StartPos As Long, EndPos As Long
StartPos = InStr(SourceHtml, PrefixChars)
If StartPos = 0 Then Exit Function
StartPos = StartPos + Len(PrefixChars)
EndPos = InStr(StartPos, SourceHtml, SuffixChars) - 1
ExtractSIC = Mid(SourceHtml, StartPos, EndPos - StartPos + 1)
End Function
答案 1 :(得分:0)
再次感谢mwolfe。我在下面发布了我的代码,但你提供的内容更优雅。我知道SIC只有4位数,所以我很懒,并且在代码中做了一个假设,这可能会在将来引发错误。您可以在注释掉的部分看到我是如何做到的。
Sub GetSICs()
Application.ScreenUpdating = False
Dim AWBN As String
Dim ASN As String
Dim CIK As String
Dim NUM_FILES_TO_GET As Long
Dim COUNTER As Long
Dim SICTagPos As Integer
Dim SIC As String
Set IEbrowser = CreateObject("InternetExplorer.application")
IEbrowser.Visible = False
AWBN = ActiveWorkbook.Name
ASN = ActiveSheet.Name
Workbooks(AWBN).Sheets(ASN).Range("A1").Select
ActiveCell.Offset(0, 11) = "SIC"
NUM_FILES_TO_GET = Application.WorksheetFunction.CountA(Range("A:A"))
For COUNTER = 1 To 3 'NUM_FILES_TO_GET
Application.StatusBar = "Counter = " & COUNTER
'SICTagPos = 0
CIK = ActiveCell.Offset(COUNTER, 2)
IEbrowser.Navigate URL:="http://www.sec.gov/edgar/searchedgar/companysearch.html"
Do
DoEvents
Loop Until IEbrowser.readyState = 4
Set frm = IEbrowser.Document.forms(0)
frm("CIK").Value = CIK
frm.submit
While IEbrowser.Busy Or IEbrowser.readyState <> 4: DoEvents: Wend
SIC = ExtractSIC(IEbrowser.Document.body.innerhtml)
'SICTagPos = InStr(1, IEbrowser.Document.body.innerhtml, "SIC=")
'SIC = Right(Left(IEbrowser.Document.body.innerhtml, SICTagPos + 7), 4)
ActiveCell.Offset(COUNTER, 11).NumberFormat = "@"
ActiveCell.Offset(COUNTER, 11) = SIC
Next
Application.StatusBar = False
Application.ScreenUpdating = True
End Sub
Function ExtractSIC(SourceHtml As String) As String
Const PrefixChars As String = "&SIC="
Const SuffixChars As String = "&"
Dim StartPos As Long, EndPos As Long
StartPos = InStr(SourceHtml, PrefixChars)
If StartPos = 0 Then Exit Function
StartPos = StartPos + Len(PrefixChars)
EndPos = InStr(StartPos, SourceHtml, SuffixChars) - 1
ExtractSIC = Mid(SourceHtml, StartPos, EndPos - StartPos + 1)
End Function