我有以下代码使用HtmlAgilityPack为许多网站提取html代码。除了asos.com之外,所有这些似乎都运作良好。当运行url时,它返回随机字符(<\ b \ 0 \ 0 \ 0 \ 0 \ 0 \0UÍ“ï&amp;¾CãÁ¢> \bãhìÁ3 - «Zi
ý}z'š/»ómf³Ü `]在@iÉÑbr[œ¡Ä¬v7Ðœ¶7N[GáôSv;Ü°[†.A * 3Z¢G×ù6OƒäwPŒõH\ RU \vzìmèÎ; M>4q_K¨Ð)
HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
doc.OptionReadEncoding = false;
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");
request.Timeout = 10000;
request.ReadWriteTimeout = 32000;
request.UserAgent = "TEST";
request.Method = "GET";
request.Accept = "text/html";
request.AllowAutoRedirect = false;
request.CookieContainer = new CookieContainer();
StreamReader reader = new StreamReader(request.GetResponse().GetResponseStream(), Encoding.Default); //put your encoding
doc.Load(reader);
string html = doc.DocumentNode.OuterHtml;
我已经通过Fiddler运行了网址,但似乎没有任何迹象表明应该存在问题。我出错的任何想法?
请在此处查看来自fiddler的标题图片:http://i.stack.imgur.com/2LRFY.png
答案 0 :(得分:1)
这与Html Agility Pack无关,因为您已将AllowAutoRedirect
设置为false。删除它,它会工作。该网站显然会进行重定向,如果您需要最终的HTML文本,则需要关注它。
请注意,Html Agility Pack有一个实用程序HtmlWeb
类,可以直接以HmlDocument
的形式下载文件:
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(@"http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");