如何使用WinHTTPRequest.5.1从安全网站下载文件

时间:2019-01-16 18:56:54

标签: html vba web-scraping winhttp

我正在尝试使用VBA从网站上静默下载文件(PDF)。到目前为止,我登录时没有出现问题,在初始屏幕上输入用户名和密码,导航到站点内的报告页面,在表格中成功获取了我的文件列表。我得到的文件的URL没有问题。这是我碰壁的地方。我确实下载了文件,但是在打开文件时收到安全警告,我必须登录才能查看它。我可以通过以下方式模拟此警告:未登录且将URL粘贴到任何浏览器中时,URL看起来相同。因此,我正在下载但未进行身份验证。

有关下载问题的代码:

Dim strCookie As String
Dim strResponse As String
Dim xobj As Object
Dim WinHttpReq As Object
Dim WinHttpReq2 As Object
Dim oStream As Object

' Set xobj = New WinHttp.WinHttpRequest
strDocLink = "https://atlasbridge.com" & strDocLink & "&RT=PREVMAIL"
Debug.Print strDocLink
' launch tab & goto url/doc
' try to download the link(this is the url of the file)
' strDocLink
Set WinHttpReq = CreateObject("WINHTTP.WinHTTPRequest.5.1")
strUrl = "https://atlasbridge.com/search/AgencyReports.aspx"
WinHttpReq.Open "GET", strUrl, False
WinHttpReq.Option(WinHttpRequestOption_EnableRedirects) = False
WinHttpReq.setRequestHeader "Referer", "https://atlasbridge.com/search/AgencyReports.aspx"
WinHttpReq.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
WinHttpReq.setRequestHeader "Connection", "keep-alive"
WinHttpReq.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
WinHttpReq.setRequestHeader "Accept-Language", "en-US,en;q=0.5"
WinHttpReq.Send
If WinHttpReq.Status = 200 Then
    strResponse = WinHttpReq.responseText
    Debug.Print strResponse
    strCookie = WinHttpReq.getResponseHeader("Set-Cookie") ' this only gets the cookie; cookie seems include the session id
    resp = WinHttpReq.getAllResponseHeaders
    ' resp = WinHttpReq.responseBody
    ' strCookie = WinHttpReq.getResponseHeader("Cookie") ' doesnt find the requested header
    Debug.Print strCookie
    Debug.Print resp
    End If
' then open second session & try to get document
Set WinHttpReq2 = CreateObject("WINHTTP.WinHTTPRequest.5.1")
WinHttpReq2.Open "GET", strDocLink, False
WinHttpReq2.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
WinHttpReq2.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
WinHttpReq2.setRequestHeader "Accept-Language", "en-US,en;q=0.5"
WinHttpReq2.setRequestHeader "Referer", "https://atlasbridge.com/search/AgencyReports.aspx"
WinHttpReq2.setRequestHeader "Connection", "keep-alive"
WinHttpReq2.setRequestHeader "Host", "atlasbridge.com:443" '
WinHttpReq2.setRequestHeader "Accept-Encoding", "gzip, deflate, br"
' WinHttpReq2.setRequestHeader "Transfer-Encoding", "chunked"
' doesnt like this one causes error on the .send
WinHttpReq2.setRequestHeader "Cache-Control", "private"
WinHttpReq2.setRequestHeader "Upgrade-Insecure-Requests", "1"
WinHttpReq2.setRequestHeader "Content-Type", "application/pdf"
WinHttpReq2.setRequestHeader "Cookie", strCookie
WinHttpReq2.Send
If WinHttpReq2.Status = 200 Then
    Set oStream = CreateObject("ADODB.Stream")
    oStream.Open
    oStream.Type = 1
    oStream.Write WinHttpReq2.responseBody
    oStream.SaveToFile "C:\Users\MyUserName\Desktop\DownloadedMail\atlasreportdownload.ashx.pdf", 1 ' 1 = no overwrite, 2 = overwrite
    oStream.Close
End If

我尝试了几种不同的方法,但是我不相信我会获得完整的cookie和会话ID。

我在WinHttpReq.getResponseHeader("Set-Cookie")getAllResponseHeaders中获得的cookie看起来像:

  

NSC_bumbtcsjehf.dpn_TTM_443_MCWT = ffffffffc3a00a0a00000a0000000000005e445a4a423660; Version = 1; Max-Age = 2400; path = /; secure; httponly

但是当我在Firefox中使用LiveHeaders时,我会看到:

  

Cookie:ASP.NET_SessionId = z2e4adilfjgiyynx2mntnh1k; NSC_bumbtcsjehf.dpn_TTM_443_MCWT = ffffffffc3a00a0a000000000005e445a4a423660; AuthToken = 0be22946-a97a-442e-bd93-c80f0c96a525; AtlasLastMessage = 1173; lc_sso7549731 = 1546651094987; __lc.visitor_id.7549731 = S1546651090.26728e19e6

但是,当我Debug.Print响应时,我似乎无法公开带有AuthToken和会话ID等的完整cookie。有人可以指出我正确的方向,这样我就可以测试我所做的变化吗?预先谢谢你。

更新:第一个请求的响应头:

 Cache-Control: private
 Date: Wed, 16 Jan 2019 22:04:54 GMT
 Content-Length: 164
 Content-Type: text/html; charset=utf-8
 Location: /default.aspx?err=Expired&dest=%2fhome.aspx
 Server: Microsoft-IIS/7.0
 Set-Cookie: ASP.NET_SessionId=mo0owzztbul5of0litxox5kx; path=/; secure; HttpOnly
 Set-Cookie: NSC_bumbtcsjehf.dpn_TTM_443_MCWT=ffffffffc3a00a1a45525d5f4f58455e445a4a423660;Version=1;Max-Age=2400;path=/;secure;httponly
 X-AspNet-Version: 4.0.30319
 X-UA-Compatible: IE=edge
 X-Powered-By: ASP.NET

我现在正在下载响应正文。

0 个答案:

没有答案