我必须定期在网络浏览器中手动执行以下操作:
我想使用.NET自动执行此过程。
前几天我在这里发布了这个question。感谢Rubens Farias的一段代码,我现在能够执行上述步骤1和2.在第2步之后,我能够读取包含要下载文件的URL的页面的HTML(使用afterLoginPage = reader .ReadToEnd())。此页面仅在登录被授予时显示,因此验证步骤2是否成功。
我现在的问题是如何执行第3步。我尝试了一些事情,但无济于事,尽管以前成功登录,但仍然拒绝访问该文件。
为了澄清我将在下面发布代码的内容,当然没有实际的登录信息和网站。最后,变量afterLoginPage包含登录后页面的HTML,其中包含我要下载的文件的链接。此链接也明显以https开头。
Dim httpsSite As String = "https://www.test.test/user/login"
' enter correct address
Dim formPage As String = ""
Dim afterLoginPage As String = ""
' Get postback data and cookies
Dim cookies As New CookieContainer()
Dim getRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
getRequest.CookieContainer = cookies
getRequest.Method = "GET"
Dim wp As WebProxy = New WebProxy("[our proxies IP address]", [our proxies port number])
wp.Credentials = CredentialCache.DefaultCredentials
getRequest.Proxy = wp
Dim form As HttpWebResponse = DirectCast(getRequest.GetResponse(), HttpWebResponse)
Using response As New StreamReader(form.GetResponseStream(), Encoding.UTF8)
formPage = response.ReadToEnd()
End Using
Dim inputs As New Dictionary(Of String, String)()
inputs.Add("form_build_id", "[some code I'd like to keep secret]")
inputs.Add("form_id", "user_login")
For Each input As Match In Regex.Matches(formPage, "<input.*?name=""(?<name>.*?)"".*?(?:value=""(?<value>.*?)"".*?)? />", RegexOptions.IgnoreCase Or RegexOptions.ECMAScript)
If input.Groups("name").Value <> "form_build_id" And _
input.Groups("name").Value <> "form_id" Then
inputs.Add(input.Groups("name").Value, input.Groups("value").Value)
End If
Next
inputs("name") = "[our login name]"
inputs("pass") = "[our login password]"
Dim buffer As Byte() = Encoding.UTF8.GetBytes( _
[String].Join("&", _
Array.ConvertAll(Of KeyValuePair(Of String, String), String)(inputs.ToArray(), _
Function(item As KeyValuePair(Of String, String)) (item.Key & "=") + System.Web.HttpUtility.UrlEncode(item.Value))))
Dim postRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
postRequest.CookieContainer = cookies
postRequest.Method = "POST"
postRequest.ContentType = "application/x-www-form-urlencoded"
postRequest.Proxy = wp
' send username/password
Using stream As Stream = postRequest.GetRequestStream()
stream.Write(buffer, 0, buffer.Length)
End Using
' get response from login page
Using reader As New StreamReader(postRequest.GetResponse().GetResponseStream(), Encoding.UTF8)
afterLoginPage = reader.ReadToEnd()
End Using
答案 0 :(得分:3)
<击>
正如我在该问题的评论中所说,你只需要使用DownloadFile
方法:
using(WebClient client = new WebClient())
client.DownloadFile(
"http://www.google.com/", "google_homepage.html");
只需将"http://www.google.com/"
替换为您的文件地址即可。
击>
抱歉,您需要使用HttpWebRequest
:
string fileAddress = "http://www.google.com/";
HttpWebRequest client = (HttpWebRequest)WebRequest.Create(fileAddress));
client.CookieContainer = cookies;
int read = 0;
byte[] buffer = new byte[1024];
using(FileStream download =
new FileStream("google_homepage.html", FileMode.Create))
{
Stream stream = client.GetResponse().GetResponseStream();
while((read = stream.Read(buffer, 0, buffer.Length)) != 0)
{
download.Write(buffer, 0, read);
}
}
答案 1 :(得分:2)
下载文件时是否传递了Cookie?
答案 2 :(得分:1)
您需要保留登录表单发回给您的会话/身份验证Cookie。基本上从认证表格的响应中取出cookie,并在进行第3步时将其发回。
这是一种扩展Web客户端的简单方法,它可以为您提供比上述代码更简单的代码:
http://couldbedone.blogspot.com/2007/08/webclient-handling-cookies.html
只是:
答案 3 :(得分:1)
您也可以选择自动化Internet-Explorer,而不是尝试通过HTTPS发送Web请求
Web automation with Powershell使用PowerShell解释了这一点,但是当您将Internet Explorer作为COM对象访问时,也可以在C#中执行此操作。
如果您只需要一个文件并且不需要担心内存泄漏,那么此方法可以很好地工作。