我正试图在这个网址上从Morningstar获取内部所有权: http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US
这是我正在使用的代码:
Sub test()
Dim appIE As Object
Set appIE = CreateObject("InternetExplorer.Application")
With appIE
.Navigate "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US"
.Visible = True
End With
While appIE.Busy
DoEvents
Wend
Set allRowOfData = appIE.Document.getElementById("currentInsiderVal")
Debug.Print allRowOfData
Dim myValue As String: myValue = allRowOfData.Cells(0).innerHTML
appIE.Quit
Set appIE = Nothing
Range("A30").Value = myValue
End Sub
我在第
行得到了运行时错误13Set allRowOfData = appIE.Document.getElementById("currentInsiderVal")
但我看不出任何不匹配。发生了什么事?
答案 0 :(得分:1)
您可以使用XHR和RegEx而不是繁琐的IE来实现:
Sub Test()
Dim sContent
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US", False
.Send
sContent = .ResponseText
End With
With CreateObject("VBScript.RegExp")
.Pattern = ",""currInsiderVal"":(.*?),"
Range("A30").Value = .Execute(sContent).Item(0).SubMatches(0)
End With
End Sub
以下是代码的工作原理:
首先创建MSXML2.XMLHTTP
ActiveX实例。在同步模式下使用目标URL打开GET请求(执行中断直到收到响应)。
然后创建VBScript.RegExp
。默认情况下,.IgnoreCase
,.Global
和.MultiLine
属性为False
。模式为,"currInsiderVal":(.*?),
,其中(.*?)
为捕获组,.
表示任何字符.*
- 零个或多个字符.*?
- 尽可能少字符(懒惰匹配)。模式中的其他字符可以找到。 .Execute
方法返回一组匹配项,其中只有一个匹配对象,因为.Global
为False
。此匹配对象具有一组子匹配,其中只有一个子匹配,因为该模式包含唯一的捕获组。
有关正则表达式的一些有用的MSDN文章:
Microsoft Beefs Up VBScript with Regular Expressions
Introduction to Regular Expressions
以下是我创建代码的说明:
首先,我使用浏览器在网页DOM上找到了一个包含目标值的元素:
相应的节点是:
<td align="right" id="currrentInsiderVal">143.51</td>
然后我创建了XHR并在响应HTML中找到了此节点,但它没有包含该值(您可以在刷新页面后在网络选项卡上的浏览器开发人员工具中找到响应):
<td align="right" id="currrentInsiderVal">
</td>
此类行为是DHTML的典型行为。加载网页后,脚本生成动态HTML内容,或者通过XHR从Web检索数据,或者只是处理已加载的网页数据。然后我只是在响应中搜索了值143.51
,JS函数中的代码段,"currInsiderVal":143.51,
:
fundsArr = {"fundTotalHistVal":132.61,"mutualFunds":[[1,89,"#a71620"],[2,145,"#a71620"],[3,152,"#a71620"],[4,198,"#a71620"],[5,155,"#a71620"],[6,146,"#a71620"],[7,146,"#a71620"],[8,132,"#a71620"]],"insiderHisMaxVal":3.535,"institutions":[[1,273,"#283862"],[2,318,"#283862"],[3,351,"#283862"],[4,369,"#283862"],[5,311,"#283862"],[6,298,"#283862"],[7,274,"#283862"],[8,263,"#283862"]],"currFundData":[2,2202,"#a6001d"],"currInstData":[1,4370,"#283864"],"instHistMaxVal":369,"insiders":[[5,0.042,"#ff6c21"],[6,0.057,"#ff6c21"],[7,0.057,"#ff6c21"],[8,3.535,"#ff6c21"],[5,0],[6,0],[7,0],[8,0]],"currMax":4370,"histLineQuars":[[1,"Q2"],[2,"Q3"],[3,"Q4"],[4,"Q1<br>2015"],[5,"Q2"],[6,"Q3"],[7,"Q4"],[8,"Q1<br>2016"]],"fundHisMaxVal":198,"currInsiderData":[3,143,"#ff6900"],"currFundVal":2202.85,"quarters":[[1,"Q2"],[2,""],[3,""],[4,"Q1<br>2015"],[5,""],[6,""],[7,""],[8,"Q1<br>2016"]],"insiderTotalHistVal":3.54,"currInstVal":4370.46,"currInsiderVal":143.51,"use10YearData":"false","instTotalHistVal":263.74,"maxValue":369};
因此,基于它创建的正则表达式模式应该找到,"currInsiderVal":<some text>,
,其中<some text>
是我们的目标值。
答案 1 :(得分:0)
看一下网站,你想要检索的元素中有一个拼写错误;而不是currentInsiderVal
尝试使用currrentInsiderVal
,您应该正确检索数据。
可能值得考虑一些错误捕获,以便为您检索的任何其他字段捕获此类内容?
在您发表评论后,我仔细看了一下。你的问题似乎是试图捕获单个单元格的id,而不是沿着对象树导航。我已修改代码以检索您所在表的行,然后将myValue设置为该行中的正确单元格。当我试用它时似乎工作。试一试?
Sub test()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US"
.Visible = True
End With
While appIE.Busy
DoEvents
Wend
Set allRowOfData = appIE.Document.getelementbyID("tableTest").getElementsByTagName("tbody")(0).getElementsByTagName("tr")(5)
myValue = allRowOfData.Cells(2).innerHTML
appIE.Quit
Set appIE = Nothing
Range("A30").Value = myValue
End Sub