下载文件;使用.execScript VBA执行JavaScript函数

时间:2018-01-17 11:11:56

标签: javascript html vba excel-vba web-scraping

情况:

我正在从网页NHS Delayed Transfers of Care下载文件。

在HTML中我可以看到以下内容:

onclick="ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');"

在查看here并查看这些SO问题后(其中包括):

我的印象是ga()是一个JavaScript函数,我应该可以直接使用.execScript调用。

问题:

我可以使用.execScript执行JavaScript函数来下载文件吗?如果没有,我该如何下载文件?

我尝试了什么:

我尝试了以下尝试失败:

1)Call html.parentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

  

' -2147352319自动化错误

2)Call html.frames(0).execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

  

错误438对象不支持此属性或方法

3)Call currentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

  

错误91对象变量或未设置块变量

4)Call CurrentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

  

-2147352319由于错误80020101,无法完成操作。

我承认对这些操作知之甚少。谁能看到我出错的地方呢?

代码:

Option Explicit

Public Sub DownloadDTOC()

    Dim http As New XMLHTTP60
    Dim html As New HTMLDocument
    Dim CurrentWindow As HTMLWindowProxy

    With http
        .Open "GET", "https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/delayed-transfers-of-care-data-2017-18/", False
        .send
        html.body.innerHTML = .responseText
    End With

    On Error GoTo Errhand

    'Call html.parentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '-2147352319   Automation error

    'Call html.frames(0).execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '438 Object doesn't support this property or method
'automation error

    'Call currentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") ' 91 Object variable or With block variable not set

    Set CurrentWindow = html.parentWindow
    Call CurrentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '--2147352319  Could not complete the operation due to error 80020101.

    Exit Sub

Errhand:
    If Err.Number <> 0 Then Debug.Print Err.Number, Err.Description
End Sub

添加了参考文献:

References in project

这是HTML的简化版本。道歉,我不习惯格式化HTML。

&#13;
&#13;
<p>
  <a href="https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls" class="xls-link" onclick="ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');">Total Delayed Days Local Authority 2017-18 November (XLS, 121KB)</a>
  <br>
</p>
&#13;
&#13;
&#13;

1 个答案:

答案 0 :(得分:0)

因此,我最终使用CSS选择器来获取所有下载的href,并将它们传递给URLMon进行下载。由于最新文件有两个月的滞后时间,因此我过滤了要在月底两个月后下载的文件。


CSS选择器:

我选择的选择器是#main-content a[href*=xls]

这会查找具有a标签的元素,属性href包含字符串"xls"的元素,该元素位于ID为main=content的元素内部。


示例CSS查询结果:

query results


VBA:

Option Explicit
Private Declare PtrSafe Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" ( _
ByVal pCaller As LongPtr, _
ByVal szURL As String, _
ByVal szFileName As String, _
ByVal dwReserved As LongPtr, _
ByVal lpfnCB As LongPtr _
) As Long
Private Declare PtrSafe Function DeleteUrlCacheEntry Lib "Wininet.dll" _
Alias "DeleteUrlCacheEntryA" ( _
ByVal lpszUrlName As String _
) As Long

Public Const BINDF_GETNEWESTVERSION As Long = &H10

Public Sub DownloadFiles()
    Dim http As New XMLHTTP60, html As New HTMLDocument, downloads As Collection
    With http
        .Open "GET", "https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/statistical-work-areas-delayed-transfers-of-care-delayed-transfers-of-care-data-2018-19/", False
        .send
        html.body.innerHTML = .responseText
    End With

    Dim aNodeList As Object, i As Long
    Set downloads = New Collection
    Set aNodeList = html.querySelectorAll("#main-content a[href*=xls]")
    For i = 0 To aNodeList.Length - 1
        downloads.Add aNodeList.item(i).getAttribute("href")
    Next i

    For i = 1 To downloads.Count
        If InStr(downloads(i), Format(DateAdd("m", -2, Date), "mmmm-yyyy")) > 0 Then
            Debug.Print downloads(i)
            downloadFile downloads(i)
        End If
    Next i
End Sub

Public Sub downloadFile(ByVal url As String)
    Dim ret As Long, arr() As String, outputPath As String
    arr = Split(url, Chr$(47))
    outputPath = "C:\Users\HarrisQ\Desktop\" & arr(UBound(arr))
    ret = URLDownloadToFile(0, url, outputPath, BINDF_GETNEWESTVERSION, 0)
End Sub

参考:

需要引用HTML对象库和Microsoft XML。


API调用:

写为64位