Question

我尝试使用WebClient下载html文件。

这是我的代码：

public string GetWebData(string url)
{
        string html = string.Empty;

        using (WebClient client = new WebClient())
        {
            Uri innUri = null;
            Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out innUri);

            try
            {
                client.Headers.Add("Accept-Language", " en-US");
                client.Headers.Add("Accept-Encoding", "gzip, deflate");
                client.Headers.Add("Accept", " text/html, application/xhtml+xml, */*");
                client.Headers.Add("User-Agent", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)");

                using (StreamReader str = new StreamReader(client.OpenRead(innUri)))
                {
                    html = str.ReadToEnd();
                }
            }
            catch (WebException we)
            {
                throw we;
            }

            return html;
        }
    }

网址为http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c。

enter image description here

但我可以毫无问题地在IE9和Firefox和Chrome浏览器中导航到此URL。我用Fiddler来解决这个问题。

我发现WebClient.Request后网址已更改 - 请参阅下图：

实际网址：http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c.

enter image description here

请看差异。我删除了网址末尾的点。但它不适用于浏览器（IE9，Firefox，Chrome）。如何将实际网址更改为此网址？

请帮帮我。

Answer 1

非常感谢Eric Law。我使用此链接来解决我的问题：

HttpWebRequest to URL with dot at the end

非常感谢您解决我的头痛问题。

Answer 2

此行之后的网址仍然正确：

     Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out innUri);

我猜它适用于其他网站？

Answer 3

我认为你在.NET URI对象中发现了一个很酷的错误。

MessageBox.Show(new Uri("http://example.com/bug/here."));

所示：

http://example.com/bug/here

请注意，缺少尾随句点。

WebClient返回404错误，但此URL在WebBrowser.Navigate方法中有效

3 个答案: