我在url上获取WebException为什么会这样?

时间:2012-09-11 15:24:43

标签: c#

我有这段代码:

private List<string> webCrawler(string url, int levels)
        {
            HtmlAgilityPack.HtmlDocument doc;
            HtmlWeb hw = new HtmlWeb(); 
            List<string> webSites;
            List<string> csFiles = new List<string>();

            csFiles.Add("temp string to know that something is happening in level = " + levels.ToString());
            csFiles.Add("current site name in this level is : "+url);

            doc = hw.Load(url);
            webSites = getLinks(doc);


            if (levels == 0)
            {
                return csFiles;
            }
            else
            {
                int actual_sites = 0;
                for (int i = 0; i < webSites.Count() && i< 20; i++)                 {
                    string t = webSites[i];
                                        if ( (t.StartsWith("http://")==true) || (t.StartsWith("https://")==true) )                     {
                        actual_sites++;
                        csFiles.AddRange(webCrawler(t, levels - 1));
                        Texts(richTextBox1, "Level Number " + levels + " " + t + Environment.NewLine, Color.Red);
                    }
                }

                return csFiles;
            }


        }

getLinks()是:

private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
        {

            List<string> mainLinks = new List<string>();
            var linkNodes = document.DocumentNode.SelectNodes("//a[@href]");
            if (linkNodes != null)
            {
                foreach (HtmlNode link in linkNodes)
                {
                    var href = link.Attributes["href"].Value;
                    mainLinks.Add(href);
                }
            }
            return mainLinks;

        }

问题是例如我爬进google.com所以几次到达网站后:

http://picasa.google.co.il/intl/iw/#utm_source=iw-all-more&amp;utm_campaign=iw-pic&amp;utm_medium=et

然后即时获取异常:

doc = hw.Load(url);

错误是:无法解析远程名称:'picasa.google.co.il'

例外是:

System.Net.WebException was unhandled
  Message=The remote name could not be resolved: 'picasa.google.co.il'
  Source=System
  StackTrace:
       at System.Net.HttpWebRequest.GetResponse()
       at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1446
       at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1563
       at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1152
       at HtmlAgilityPack.HtmlWeb.Load(String url) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1107
       at GatherLinks.Form1.webCrawler(String url, Int32 levels) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 79
       at GatherLinks.Form1.webCrawler(String url, Int32 levels) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 108
       at GatherLinks.Form1.webCrawler(String url, Int32 levels) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 108
       at GatherLinks.Form1..ctor() in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 31
       at GatherLinks.Program.Main() in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Program.cs:line 18
       at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args)
       at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
       at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
  InnerException: 

我如何修复/修复/解决这个问题?

谢谢。

1 个答案:

答案 0 :(得分:3)

异常告诉您它无法将picasa.google.co.il解析为IP地址。您可能只需要验证名称是否正确。

打开命令窗口并输入:

ping picasa.google.co.il

您会发现您的计算机无法与此服务器通信,因为它没有DNS条目。