我正在尝试加载我从RSS Feed收到的页面,并收到以下WebException:
Cannot handle redirect from HTTP/HTTPS protocols to other dissimilar ones.
内部异常:
Invalid URI: The hostname could not be parsed.
我编写了一个代码,尝试通过HttpWebRequest
加载网址。由于我收到了一些建议,当HttpWebRequest
失败时,我将AllowAutoRedirect
设置为false
,并基本上手动循环重定向迭代,直到找出最终失败的内容。这是我正在使用的代码,请原谅无端Console.Write/Writeline
来电:
Uri url = new Uri(val);
bool result = true;
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(url);
string source = String.Empty;
Uri responseURI;
try
{
using (System.Net.WebResponse webResponse = req.GetResponse())
{
using (HttpWebResponse httpWebResponse = webResponse as HttpWebResponse)
{
responseURI = httpWebResponse.ResponseUri;
StreamReader reader;
if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
{
reader = new StreamReader(new GZipStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
{
reader = new StreamReader(new DeflateStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else
{
reader = new StreamReader(httpWebResponse.GetResponseStream());
}
source = reader.ReadToEnd();
reader.Close();
}
}
req.Abort();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(source);
result = true;
}
catch (ArgumentException ae)
{
Console.WriteLine(url + "\n--\n" + ae.Message);
result = false;
}
catch (WebException we)
{
Console.WriteLine(url + "\n--\n" + we.Message);
result = false;
string urlValue = url.ToString();
try
{
bool cont = true;
int count = 0;
do
{
req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(urlValue);
req.Headers.Add("Accept-Language", "en-us,en;q=0.5");
req.AllowAutoRedirect = false;
using (System.Net.WebResponse webResponse = req.GetResponse())
{
using (HttpWebResponse httpWebResponse = webResponse as HttpWebResponse)
{
responseURI = httpWebResponse.ResponseUri;
StreamReader reader;
if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
{
reader = new StreamReader(new GZipStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
{
reader = new StreamReader(new DeflateStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else
{
reader = new StreamReader(httpWebResponse.GetResponseStream());
}
source = reader.ReadToEnd();
if (string.IsNullOrEmpty(source))
{
urlValue = httpWebResponse.Headers["Location"].ToString();
count++;
reader.Close();
}
else
{
cont = false;
}
}
}
} while (cont);
}
catch (UriFormatException uriEx)
{
Console.WriteLine(urlValue + "\n--\n" + uriEx.Message + "\r\n");
result = false;
}
catch (WebException innerWE)
{
Console.WriteLine(urlValue + "\n--\n" + innerWE.Message+"\r\n");
result = false;
}
}
if (result)
Console.WriteLine("testing successful");
else
Console.WriteLine("testing unsuccessful");
由于目前只是测试代码,我将val
硬编码为http://rss.nytimes.com/c/34625/f/642557/s/3d072012/sc/38/l/0Lartsbeat0Bblogs0Bnytimes0N0C20A140C0A70C30A0Csarah0Ekane0Eplay0Eamong0Eofferings0Eat0Est0Eanns0Ewarehouse0C0Dpartner0Frss0Gemc0Frss/story01.htm
提供UriFormatException
的结束网址为:http:////www-nc.nytimes.com/2014/07/30/sarah-kane-play-among-offerings-at-st-anns-warehouse/?=_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&partner=rss&emc=rss&_r=6&
现在我确定我是否遗漏了某些内容,或者我是否在做错循环,但是如果我接受val
并将其放入浏览器中,那么页面就会很好地加载,如果我拿出导致异常的url并将其放入浏览器中,我将被带到帐户登录nytimes。
我有很多这些RSS订阅源会导致此问题。我也有大量这些rss feed网址,根本没有加载问题。如果有任何其他信息需要帮助解决此问题,请与我们联系。任何有关这方面的帮助将不胜感激。
是否需要启用某种cookie功能?
答案 0 :(得分:3)
您需要在执行所有请求时跟踪Cookie。您可以使用CookieContainer类的实例来实现此目的。
在方法的顶部,我做了以下更改:
Uri url = new Uri(val);
bool result = true;
// keep all our cookies for the duration of our calls
var cookies = new CookieContainer();
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(url);
// assign our CookieContainer to the new request
req.CookieContainer = cookies;
string source = String.Empty;
Uri responseURI;
try
{
在您创建新HttpWebRequest
的异常处理程序中,您再次从我们的CookieContainer
进行分配:
do
{
req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(urlValue);
// reuse our cookies!
req.CookieContainer = cookies;
req.Headers.Add("Accept-Language", "en-us,en;q=0.5");
req.AllowAutoRedirect = false;
using (System.Net.WebResponse webResponse = req.GetResponse())
{
这确保在每次连续调用时,在下一个请求中再次重新发送已存在的cookie。如果您将其删除,则不会发送任何Cookie,因此您尝试访问的网站假定您是新用户/新用户/未见用户,并为您提供一种身份验证路径。
如果您希望存储/保留cookie超出此方法,您可以将cookie实例变量移动到静态公共属性,这样您就可以在程序范围内使用所有这些cookie,如下所示:
public static class Cookies
{
static readonly CookieContainer _cookies = new CookieContainer();
public static CookieContainer All
{
get
{
return _cookies;
}
}
}
并在WebRequest
中使用它:
var req = (System.Net.HttpWebRequest) WebRequest.Create(url);
req.CookieContainer = Cookies.All;