要从URL加载HTML,我使用的是以下方法
public HtmlDocument DownloadSource(string url)
{
try
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(DownloadString(url));
return doc;
}
catch (Exception e)
{
if (Task.Error == null)
Task.Error = e;
Task.Status = TaskStatuses.Error;
Done = true;
return null;
}
}
但是今天突然上面的代码停止工作了。我发现了另一种方法,它可以正常工作。
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(url.ToString());
现在我只想知道两种方法之间的区别
答案 0 :(得分:1)
现在看来User-Agent
标头对于your site是必需的。
HtmlAgilityPack
一切都很好,但是您应该更改DownloadString(url)
方法。如果您使用Fiddler检查请求,则会看到它返回403 Forbidden
:
解决方案是在请求中添加任何User-Agent
标头:
using HtmlAgilityPack;
using System;
using System.Net;
class Program
{
static void Main()
{
var doc = DownloadSource("https://videohive.net/item/inspired-slideshow/21544630");
Console.ReadKey();
}
public static HtmlDocument DownloadSource(string url)
{
try
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(DownloadString(url));
return doc;
}
catch (Exception e)
{
// exception handling here
}
return null;
}
static String DownloadString(String url)
{
WebClient client = new WebClient();
client.Headers.Add("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:x.x.x) Gecko/20041107 Firefox/x.x");
return client.DownloadString(url);
}
}