使用.NET通过HTTPS下载文件(第2部分)

时间:2010-02-09 09:07:35

标签: .net file https download

我必须定期在网络浏览器中手动执行以下操作:

  1. 访问https网站。
  2. 登录网络表单。
  3. 点击链接下载大文件(135MB)。
  4. 我想使用.NET自动执行此过程。

    前几天我在这里发布了这个question。感谢Rubens Farias的一段代码,我现在能够执行上述步骤1和2.在第2步之后,我能够读取包含要下载文件的URL的页面的HTML(使用afterLoginPage = reader .ReadToEnd())。此页面仅在登录被授予时显示,因此验证步骤2是否成功。

    我现在的问题是如何执行第3步。我尝试了一些事情,但无济于事,尽管以前成功登录,但仍然拒绝访问该文件。

    为了澄清我将在下面发布代码的内容,当然没有实际的登录信息和网站。最后,变量afterLoginPage包含登录后页面的HTML,其中包含我要下载的文件的链接。此链接也明显以https开头。

    Dim httpsSite As String = "https://www.test.test/user/login"
    ' enter correct address
    Dim formPage As String = ""
    Dim afterLoginPage As String = ""
    
    ' Get postback data and cookies
    Dim cookies As New CookieContainer()
    Dim getRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
    getRequest.CookieContainer = cookies
    getRequest.Method = "GET"
    
    Dim wp As WebProxy = New WebProxy("[our proxies IP address]", [our proxies port number])
    wp.Credentials = CredentialCache.DefaultCredentials
    getRequest.Proxy = wp
    
    Dim form As HttpWebResponse = DirectCast(getRequest.GetResponse(), HttpWebResponse)
    Using response As New StreamReader(form.GetResponseStream(), Encoding.UTF8)
        formPage = response.ReadToEnd()
    End Using
    
    Dim inputs As New Dictionary(Of String, String)()
    inputs.Add("form_build_id", "[some code I'd like to keep secret]")
    inputs.Add("form_id", "user_login")
    For Each input As Match In Regex.Matches(formPage, "<input.*?name=""(?<name>.*?)"".*?(?:value=""(?<value>.*?)"".*?)? />", RegexOptions.IgnoreCase Or RegexOptions.ECMAScript)
        If input.Groups("name").Value <> "form_build_id" And _
           input.Groups("name").Value <> "form_id" Then
            inputs.Add(input.Groups("name").Value, input.Groups("value").Value)
        End If
    Next
    
    inputs("name") = "[our login name]"
    inputs("pass") = "[our login password]"
    
    Dim buffer As Byte() = Encoding.UTF8.GetBytes( _
    [String].Join("&", _
    Array.ConvertAll(Of KeyValuePair(Of String, String), String)(inputs.ToArray(), _
    Function(item As KeyValuePair(Of String, String)) (item.Key & "=") + System.Web.HttpUtility.UrlEncode(item.Value))))
    
    Dim postRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
    postRequest.CookieContainer = cookies
    postRequest.Method = "POST"
    postRequest.ContentType = "application/x-www-form-urlencoded"
    postRequest.Proxy = wp
    
    ' send username/password
    Using stream As Stream = postRequest.GetRequestStream()
        stream.Write(buffer, 0, buffer.Length)
    End Using
    
    ' get response from login page
    Using reader As New StreamReader(postRequest.GetResponse().GetResponseStream(), Encoding.UTF8)
        afterLoginPage = reader.ReadToEnd()
    End Using
    

4 个答案:

答案 0 :(得分:3)

<击> 正如我在该问题的评论中所说,你只需要使用DownloadFile方法:

using(WebClient client = new WebClient())
    client.DownloadFile(
        "http://www.google.com/", "google_homepage.html");

只需将"http://www.google.com/"替换为您的文件地址即可。

抱歉,您需要使用HttpWebRequest

string fileAddress = "http://www.google.com/";
HttpWebRequest client = (HttpWebRequest)WebRequest.Create(fileAddress));
client.CookieContainer = cookies;
int read = 0;
byte[] buffer = new byte[1024];
using(FileStream download = 
  new FileStream("google_homepage.html", FileMode.Create))
{
    Stream stream = client.GetResponse().GetResponseStream();
    while((read = stream.Read(buffer, 0, buffer.Length)) != 0)
    {
        download.Write(buffer, 0, read);
    }
}

答案 1 :(得分:2)

下载文件时是否传递了Cookie?

答案 2 :(得分:1)

您需要保留登录表单发回给您的会话/身份验证Cookie。基本上从认证表格的响应中取出cookie,并在进行第3步时将其发回。

这是一种扩展Web客户端的简单方法,它可以为您提供比上述代码更简单的代码:

http://couldbedone.blogspot.com/2007/08/webclient-handling-cookies.html

只是:

  1. 创建此CookieAwareWebClient的实例
  2. 发布登录表格
  3. 下载文件

答案 3 :(得分:1)

您也可以选择自动化Internet-Explorer,而不是尝试通过HTTPS发送Web请求 Web automation with Powershell使用PowerShell解释了这一点,但是当您将Internet Explorer作为COM对象访问时,也可以在C#中执行此操作。
如果您只需要一个文件并且不需要担心内存泄漏,那么此方法可以很好地工作。