我遇到有关从此页面下载pdf文件的问题,例如:
您可以在该页面中看到一个"申请表"它的最后一列有一个pdf链接。我已经可以使用HtmlAgilityPack解析pdf的链接,但问题是当我为pdf链接执行此操作时
WebBrowser1.Navigate(docUrl)
While (WebBrowser1.ReadyState <> WebBrowserReadyState.Complete)
System.Windows.Forms.Application.DoEvents()
End While
Dim client As New WebClient
它只返回404.虽然pdf链接没有改变,但这可能是一个基于会话的页面。至于标题,我看到WebBrowser1.Document.Cookie即使刚完成加载的页面也返回null。我有什么办法吗?
client.Headers.Add(HttpRequestHeader.Cookie, WebBrowser1.Document.Cookie)
client.DownloadFile(New Uri(pdfLink), "appForm.pdf")
顺便说一下,这是pdf链接。您可以尝试直接打开它而不单击页面以查看问题
答案 0 :(得分:0)
这是一个知道cookie的WebCLient。积分转到Pavel Savara。
public class WebClientEx : WebClient
{
public WebClientEx() // Added to original code
{
this.container = new CookieContainer();
}
public WebClientEx(CookieContainer container)
{
this.container = container;
}
public CookieContainer CookieContainer
{
get { return container; }
set { container= value; }
}
private CookieContainer container = new CookieContainer();
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest r = base.GetWebRequest(address);
var request = r as HttpWebRequest;
if (request != null)
{
request.CookieContainer = container;
}
return r;
}
protected override WebResponse GetWebResponse(WebRequest request, IAsyncResult result)
{
WebResponse response = base.GetWebResponse(request, result);
ReadCookies(response);
return response;
}
protected override WebResponse GetWebResponse(WebRequest request)
{
WebResponse response = base.GetWebResponse(request);
ReadCookies(response);
return response;
}
private void ReadCookies(WebResponse r)
{
var response = r as HttpWebResponse;
if (response != null)
{
CookieCollection cookies = response.Cookies;
container.Add(cookies);
}
}
}