HtmlAgilityPack返回随机字符

时间:2013-04-09 19:31:07

标签: httpwebrequest html-agility-pack streamreader

我有以下代码使用HtmlAgilityPack为许多网站提取html代码。除了asos.com之外,所有这些似乎都运作良好。当运行url时,它返回随机字符(<\ b \ 0 \ 0 \ 0 \ 0 \ 0 \0UÍ“ï&amp;¾CãÁ¢> \bãhìÁ3 - «Ziý}z'š/»ómf³Ü `]在@iÉÑbr[œ¡Ä¬v7Ðœ¶7N[GáôSv;Ü°[†.A * 3Z¢G×ù6OƒäwPŒõH\ RU \vzìmèÎ; M>4q_K¨Ð)

    HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
    doc.OptionReadEncoding = false;
    HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");
    request.Timeout = 10000;
    request.ReadWriteTimeout = 32000;
    request.UserAgent = "TEST";
    request.Method = "GET";
    request.Accept = "text/html";
    request.AllowAutoRedirect = false;
    request.CookieContainer = new CookieContainer();
    StreamReader reader = new StreamReader(request.GetResponse().GetResponseStream(), Encoding.Default); //put your encoding            
    doc.Load(reader);

    string html = doc.DocumentNode.OuterHtml;

我已经通过Fiddler运行了网址,但似乎没有任何迹象表明应该存在问题。我出错的任何想法?

请在此处查看来自fiddler的标题图片:http://i.stack.imgur.com/2LRFY.png

1 个答案:

答案 0 :(得分:1)

这与Html Agility Pack无关,因为您已将AllowAutoRedirect设置为false。删除它,它会工作。该网站显然会进行重定向,如果您需要最终的HTML文本,则需要关注它。

请注意,Html Agility Pack有一个实用程序HtmlWeb类,可以直接以HmlDocument的形式下载文件:

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(@"http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");