我想从网页http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures获取一些数据。
如果我使用旧的InternetExplorer对象(下面的代码),我可以浏览HTML文档。但我想使用XMLHTTP
对象(第二个代码)。
Sub IEZagon()
'we define the essential variables
Dim ie As Object
Dim TDelement, TDelements
Dim AnhorLink, AnhorLinks
'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.navigate ("[URL]http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures[/URL]")
While ie.ReadyState <> 4
DoEvents
Wend
Set AnhorLinks = .document.getElementsbytagname("a")
Set TDelements = .document.getElementsbytagname("td")
For Each AnhorLink In AnhorLinks
Debug.Print AnhorLink.innertext
Next
For Each TDelement In TDelements
Debug.Print TDelement.innertext
Next
End With
Set ie = Nothing
End Sub
使用带有XMLHTTP对象的代码:
Sub FuturesScrap(ByVal URL As String)
Dim XMLHttpRequest As XMLHTTP
Dim HTMLDoc As New HTMLDocument
Set XMLHttpRequest = New MSXML2.XMLHTTP
XMLHttpRequest.Open "GET", URL, False
XMLHttpRequest.send
While XMLHttpRequest.readyState <> 4
DoEvents
Wend
Debug.Print XMLHttpRequest.responseText
HTMLDoc.body.innerHTML = XMLHttpRequest.responseText
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set TDelements = .getElementsByTagName("td")
For Each AnchorLink In AnchorLinks
Debug.Print AnhorLink.innerText
Next
For Each TDelement In TDelements
Debug.Print TDelement.innerText
Next
End With
End Sub
我只获得基本HTML:
<html>
<head>
<title>Resource Not found</title>
<link rel= 'stylesheet' type='text/css' href='/blueprint/css/errorpage.css'/>
</head>
<body>
<table class="header">
<tr>
<td class="CMTitle CMHFill"><span class="large">Resource Not found</span></td>
</tr>
</table>
<div class="body">
<p style="font-weight:bold;">The requested resource does Not exist.</p>
</div>
<table class="footer">
<tr>
<td class="CMHFill"> </td>
</tr>
</table>
</body>
</html>
我想浏览表格和相应的数据...... 最后,我想从年份到月份选择不同的时间间隔:
我真的很感激任何帮助!谢谢!
答案 0 :(得分:5)
我可以确认在运行代码时(使用或不使用url标记)获得与您相同的HTML。我找到了一个有用的帖子here。我已经使用在那里找到的方法修改了你的代码,它现在似乎已经下载了正确的信息。
Sub test()
Call FuturesScrap1("http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures")
End Sub
我包含了调用sub,因为url标记似乎导致MSXML请求出错。
Sub FuturesScrap1(ByVal URL As String)
Dim HTMLDoc As New HTMLDocument
Dim oHttp As MSXML2.XMLHTTP
Dim sHTML As String
Dim AnchorLinks As Object
Dim TDelements As Object
Dim TDelement As Object
Dim AnchorLink As Object
On Error Resume Next
Set oHttp = New MSXML2.XMLHTTP
If Err.Number <> 0 Then
Set oHttp = CreateObject("MSXML.XMLHTTPRequest")
MsgBox "Error 0 has occured while creating a MSXML.XMLHTTPRequest object"
End If
On Error GoTo 0
If oHttp Is Nothing Then
MsgBox "For some reason I wasn't able to make a MSXML2.XMLHTTP object"
Exit Sub
End If
'Open the URL in browser object
oHttp.Open "GET", URL, False
oHttp.send
sHTML = oHttp.responseText
Debug.Print oHttp.responseText
HTMLDoc.body.innerHTML = oHttp.responseText
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set TDelements = .getElementsByTagName("td")
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next
For Each TDelement In TDelements
Debug.Print TDelement.innerText
Next
End With
End Sub
编辑以下评论:
我无法使用MSXML2对象找到表元素,源代码似乎不包含它们。在firebug中存在td标记,因此我认为该表是由JavaScript代码生成的。我不知道MSXML2是否可以运行JavaScript所以我修改了sub以使用Internet Explorer,它不是快速代码,但它确实找到了td元素并允许单击选项卡。我发现td元素可能需要一些时间才能使用(可能是因为IE必须运行JavaScript)所以我在下载数据之前已经进行了几个步骤xl等待。
我已经添加了一些将td元素的内容下载到活动工作表中的代码,如果在包含有用数据的工作簿中运行它,请小心。
Sub FuturesScrap3(ByVal URL As String)
Dim HTMLDoc As New HTMLDocument
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim oElement As Object
Dim oIE As InternetExplorer
Set oIE = New InternetExplorer
oIE.navigate URL
oIE.Visible = True
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run
Application.Wait (Now + TimeValue("0:01:00"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 1).Value = tdElement.innerText
lRow = lRow + 1
Next
'Clicking the Month tab
For Each oElement In oIE.document.all
If Trim(oElement.innerText) = "Month" Then
oElement.Focus
oElement.Click
End If
Next oElement
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run
Application.Wait (Now + TimeValue("0:01:00"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 2).Value = tdElement.innerText
lRow = lRow + 1
Next tdElement
End sub