情况:
我正在从网页NHS Delayed Transfers of Care下载文件。
在HTML中我可以看到以下内容:
onclick="ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');"
在查看here并查看这些SO问题后(其中包括):
我的印象是ga()
是一个JavaScript函数,我应该可以直接使用.execScript
调用。
问题:
我可以使用.execScript
执行JavaScript函数来下载文件吗?如果没有,我该如何下载文件?
我尝试了什么:
我尝试了以下尝试失败:
1)Call html.parentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")
' -2147352319自动化错误
2)Call html.frames(0).execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")
错误438对象不支持此属性或方法
3)Call currentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")
错误91对象变量或未设置块变量
4)Call CurrentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")
-2147352319由于错误80020101,无法完成操作。
我承认对这些操作知之甚少。谁能看到我出错的地方呢?
代码:
Option Explicit
Public Sub DownloadDTOC()
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
Dim CurrentWindow As HTMLWindowProxy
With http
.Open "GET", "https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/delayed-transfers-of-care-data-2017-18/", False
.send
html.body.innerHTML = .responseText
End With
On Error GoTo Errhand
'Call html.parentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '-2147352319 Automation error
'Call html.frames(0).execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '438 Object doesn't support this property or method
'automation error
'Call currentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") ' 91 Object variable or With block variable not set
Set CurrentWindow = html.parentWindow
Call CurrentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '--2147352319 Could not complete the operation due to error 80020101.
Exit Sub
Errhand:
If Err.Number <> 0 Then Debug.Print Err.Number, Err.Description
End Sub
添加了参考文献:
这是HTML的简化版本。道歉,我不习惯格式化HTML。
<p>
<a href="https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls" class="xls-link" onclick="ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');">Total Delayed Days Local Authority 2017-18 November (XLS, 121KB)</a>
<br>
</p>
&#13;
答案 0 :(得分:0)
因此,我最终使用CSS选择器来获取所有下载的href,并将它们传递给URLMon进行下载。由于最新文件有两个月的滞后时间,因此我过滤了要在月底两个月后下载的文件。
CSS选择器:
我选择的选择器是#main-content a[href*=xls]
这会查找具有a
标签的元素,属性href
包含字符串"xls"
的元素,该元素位于ID为main=content
的元素内部。
示例CSS查询结果:
VBA:
Option Explicit
Private Declare PtrSafe Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" ( _
ByVal pCaller As LongPtr, _
ByVal szURL As String, _
ByVal szFileName As String, _
ByVal dwReserved As LongPtr, _
ByVal lpfnCB As LongPtr _
) As Long
Private Declare PtrSafe Function DeleteUrlCacheEntry Lib "Wininet.dll" _
Alias "DeleteUrlCacheEntryA" ( _
ByVal lpszUrlName As String _
) As Long
Public Const BINDF_GETNEWESTVERSION As Long = &H10
Public Sub DownloadFiles()
Dim http As New XMLHTTP60, html As New HTMLDocument, downloads As Collection
With http
.Open "GET", "https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/statistical-work-areas-delayed-transfers-of-care-delayed-transfers-of-care-data-2018-19/", False
.send
html.body.innerHTML = .responseText
End With
Dim aNodeList As Object, i As Long
Set downloads = New Collection
Set aNodeList = html.querySelectorAll("#main-content a[href*=xls]")
For i = 0 To aNodeList.Length - 1
downloads.Add aNodeList.item(i).getAttribute("href")
Next i
For i = 1 To downloads.Count
If InStr(downloads(i), Format(DateAdd("m", -2, Date), "mmmm-yyyy")) > 0 Then
Debug.Print downloads(i)
downloadFile downloads(i)
End If
Next i
End Sub
Public Sub downloadFile(ByVal url As String)
Dim ret As Long, arr() As String, outputPath As String
arr = Split(url, Chr$(47))
outputPath = "C:\Users\HarrisQ\Desktop\" & arr(UBound(arr))
ret = URLDownloadToFile(0, url, outputPath, BINDF_GETNEWESTVERSION, 0)
End Sub
参考:
需要引用HTML对象库和Microsoft XML。
API调用:
写为64位