我正在使用VBA和MSXML抓取一些网页内容,所以我知道基础知识。但现在我想从JavaScript生成的网页获取数据。
我不能给你确切的链接,因为它是私有的,但我可以描述它 - 基本上,有标题和一些图像的div容器,下面是表格,动态加载(圆圈),但不更新(所以他们只加载一次)。如果在浏览器中打开源代码视图,则无法找到这些表,只能找到容器和标题/ src图像。但是如果你点击表并选择“检查元素”,你就可以看到<th <tr> <td>
等的典型结构。
方法我知道:
1)保存页面然后刮掉它 - 可能不是最好的解决方案。
如果我有一个网址列表,有没有快速的方法来保存所有网页?
2)通过VBA使用Internet Explorer控件,等到页面加载然后像往常一样获取元素 - 但对我来说似乎很慢(?) - 就像一页上的25秒一样,即使它加载了0.5秒。
也许我应该关闭一些减慢装载的东西? 你能检查一下是什么问题吗?
以下是我找到的代码:
Sub FuturesScrap3(ByVal URL As String)
Dim HTMLDoc As New HTMLDocument
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim oElement As Object
Dim oIE As InternetExplorer
Set oIE = New InternetExplorer
oIE.navigate URL
oIE.Visible = True
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run
Application.Wait (Now + TimeValue("0:01:00"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 1).Value = tdElement.innerText
lRow = lRow + 1
Next
'Clicking the Month tab
For Each oElement In oIE.document.all
If Trim(oElement.innerText) = "Month" Then
oElement.Focus
oElement.Click
End If
Next oElement
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run
Application.Wait (Now + TimeValue("0:01:00"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 2).Value = tdElement.innerText
lRow = lRow + 1
Next tdElement End sub
3)使用像Selenium这样的网络驱动程序 - 找不到合适的例子。如果你从头开始给我一些,比如从classname中获取数据,就会很棒。
4)我不知道,但可能是最快的 - 直接从用于构建这些表的JS变量/数组中获取数据。我听说你可以用VBA连接VBA但是没有找到任何正确的例子来获取数据。
所有解决方案都应在VBA范围内。我想知道最快的方法是什么。
答案 0 :(得分:0)
感谢您的评论。 @Marc,不,不可能使用网络查询/电源查询&#34;从网络&#34;导入数据,只有标题。
我编写了一些代码 - 有1分钟(!)延迟(当他在页面上添加延迟加载脚本时可能会犯错误。)
Sub FuturesScrap3(ByVal URL As String)
Dim HTMLDoc As New HTMLDocument
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim oElement As Object
Dim oIE As InternetExplorer
Set oIE = New InternetExplorer
oIE.navigate URL
oIE.Visible = True
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run - 1 second is enough in my case
Application.Wait (Now + TimeValue("0:00:01"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 1).Value = tdElement.innerText
lRow = lRow + 1
Next
'Clicking the Month tab
For Each oElement In oIE.document.all
If Trim(oElement.innerText) = "Month" Then
oElement.Focus
oElement.Click
End If
Next oElement
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 2).Value = tdElement.innerText
lRow = lRow + 1
Next tdElement
End sub