加载rss feed时出现WebException

时间:2014-08-11 15:09:08

标签: c# exception rss httpwebrequest

我正在尝试加载我从RSS Feed收到的页面,并收到以下WebException:

Cannot handle redirect from HTTP/HTTPS protocols to other dissimilar ones.

内部异常:

Invalid URI: The hostname could not be parsed.

我编写了一个代码,尝试通过HttpWebRequest加载网址。由于我收到了一些建议,当HttpWebRequest失败时,我将AllowAutoRedirect设置为false,并基本上手动循环重定向迭代,直到找出最终失败的内容。这是我正在使用的代码,请原谅无端Console.Write/Writeline来电:

Uri url = new Uri(val);
bool result = true;

System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(url);
string source = String.Empty;
Uri responseURI;
try
{
    using (System.Net.WebResponse webResponse = req.GetResponse())
    {
        using (HttpWebResponse httpWebResponse = webResponse as HttpWebResponse)
        {
            responseURI = httpWebResponse.ResponseUri;
            StreamReader reader;
            if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
            {
                reader = new StreamReader(new GZipStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
            }
            else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
            {
                reader = new StreamReader(new DeflateStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
            }
            else
            {
                reader = new StreamReader(httpWebResponse.GetResponseStream());
            }
            source = reader.ReadToEnd();
            reader.Close();
        }
    }

    req.Abort();
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(source);
    result = true;
}
catch (ArgumentException ae)
{
    Console.WriteLine(url + "\n--\n" + ae.Message);
    result = false;
}
catch (WebException we)
{
    Console.WriteLine(url + "\n--\n" + we.Message);
    result = false;
        string urlValue = url.ToString();
    try
    {
        bool cont = true;
        int count = 0;
        do
        {
            req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(urlValue);
            req.Headers.Add("Accept-Language", "en-us,en;q=0.5");
            req.AllowAutoRedirect = false;
            using (System.Net.WebResponse webResponse = req.GetResponse())
            {
                using (HttpWebResponse httpWebResponse = webResponse as HttpWebResponse)
                {

                    responseURI = httpWebResponse.ResponseUri;
                    StreamReader reader;
                    if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
                    {
                        reader = new StreamReader(new GZipStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
                    }
                    else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
                    {
                        reader = new StreamReader(new DeflateStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
                    }
                    else
                    {
                        reader = new StreamReader(httpWebResponse.GetResponseStream());
                    }
                    source = reader.ReadToEnd();

                    if (string.IsNullOrEmpty(source))
                    {
                        urlValue = httpWebResponse.Headers["Location"].ToString();
                        count++;
                        reader.Close();
                    }
                    else
                    {
                        cont = false;
                    }
                }
            }
        } while (cont);
    }
    catch (UriFormatException uriEx)
    {
        Console.WriteLine(urlValue + "\n--\n" + uriEx.Message + "\r\n");
        result = false;
    }
    catch (WebException innerWE)
    {
        Console.WriteLine(urlValue + "\n--\n" + innerWE.Message+"\r\n");
        result = false;
    }
}

if (result)
    Console.WriteLine("testing successful");
else
    Console.WriteLine("testing unsuccessful");

由于目前只是测试代码,我将val硬编码为http://rss.nytimes.com/c/34625/f/642557/s/3d072012/sc/38/l/0Lartsbeat0Bblogs0Bnytimes0N0C20A140C0A70C30A0Csarah0Ekane0Eplay0Eamong0Eofferings0Eat0Est0Eanns0Ewarehouse0C0Dpartner0Frss0Gemc0Frss/story01.htm

提供UriFormatException的结束网址为:http:////www-nc.nytimes.com/2014/07/30/sarah-kane-play-among-offerings-at-st-anns-warehouse/?=_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&partner=rss&emc=rss&_r=6&

现在我确定我是否遗漏了某些内容,或者我是否在做错循环,但是如果我接受val并将其放入浏览器中,那么页面就会很好地加载,如果我拿出导致异常的url并将其放入浏览器中,我将被带到帐户登录nytimes。

我有很多这些RSS订阅源会导致此问题。我也有大量这些rss feed网址,根本没有加载问题。如果有任何其他信息需要帮助解决此问题,请与我们联系。任何有关这方面的帮助将不胜感激。

是否需要启用某种cookie功能?

1 个答案:

答案 0 :(得分:3)

您需要在执行所有请求时跟踪Cookie。您可以使用CookieContainer类的实例来实现此目的。

在方法的顶部,我做了以下更改:

Uri url = new Uri(val);
bool result = true;

// keep all our cookies for the duration of our calls
var cookies = new CookieContainer();

System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(url);

// assign our CookieContainer to the new request
req.CookieContainer = cookies;

string source = String.Empty;
Uri responseURI;
try
{

在您创建新HttpWebRequest的异常处理程序中,您再次从我们的CookieContainer进行分配:

do
{
    req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(urlValue);

    // reuse our cookies!
    req.CookieContainer = cookies;

    req.Headers.Add("Accept-Language", "en-us,en;q=0.5");
    req.AllowAutoRedirect = false;
    using (System.Net.WebResponse webResponse = req.GetResponse())
    {

这确保在每次连续调用时,在下一个请求中再次重新发送已存在的cookie。如果您将其删除,则不会发送任何Cookie,因此您尝试访问的网站假定您是新用户/新用户/未见用户,并为您提供一种身份验证路径。

如果您希望存储/保留cookie超出此方法,您可以将cookie实例变量移动到静态公共属性,这样您就可以在程序范围内使用所有这些cookie,如下所示:

public static class Cookies
{
    static readonly CookieContainer _cookies = new CookieContainer();

    public static CookieContainer All
    {
        get
        {
            return _cookies;
        }
    }
}

并在WebRequest中使用它:

var req = (System.Net.HttpWebRequest) WebRequest.Create(url);
req.CookieContainer = Cookies.All;