我正在使用链接检查程序/损坏的链接查找程序并且我得到许多误报,经过双重检查后我发现许多错误代码都返回了webexceptions但它们实际上是可下载的,但在其他一些情况下,状态代码是404,我可以从浏览中访问该页面。
所以这里是代码,它非常丑陋,并且喜欢有更多的东西,id说实用。所有状态代码都在那么大,如果用于过滤我不想添加到断链的那些,因为它们是有效链接(我测试了所有)。我需要解决的是结构(如果可能的话)以及如何不做错404.
谢谢!
try
{
HttpWebRequest request = ( HttpWebRequest ) WebRequest.Create ( uri );
request.Method = "Head";
request.MaximumResponseHeadersLength = 32; // FOR IE SLOW SPEED
request.AllowAutoRedirect = true;
using ( HttpWebResponse response = ( HttpWebResponse ) request.GetResponse() )
{
request.Abort();
}
/* WebClient wc = new WebClient();
wc.DownloadString( uri ); */
_validlinks.Add ( strUri );
}
catch ( WebException wex )
{
if ( !wex.Message.Contains ( "The remote name could not be resolved:" ) &&
wex.Status != WebExceptionStatus.ServerProtocolViolation )
{
if ( wex.Status != WebExceptionStatus.Timeout )
{
HttpStatusCode code = ( ( HttpWebResponse ) wex.Response ).StatusCode;
if (
code != HttpStatusCode.OK &&
code != HttpStatusCode.BadRequest &&
code != HttpStatusCode.Accepted &&
code != HttpStatusCode.InternalServerError &&
code != HttpStatusCode.Forbidden &&
code != HttpStatusCode.Redirect &&
code != HttpStatusCode.Found
)
{
_brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
}
else _validlinks.Add ( strUri );
}
else _brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
}
else _validlinks.Add ( strUri );
}
答案 0 :(得分:1)
您应该添加用户代理标头,因为许多网站都需要它们。