Question

我正在尝试从以下博客文章中提取内容：

static void GetBlogData (string blogPostUrl)
{
    string blogPostContent = null;

    WebClient client = new WebClient ();
    //client.Headers.Add (HttpRequestHeader.Referer, "http://www.stackoverflow.com");

    TextWriter writer = new StreamWriter ("/home/nanda/projects/mono/common/article");

    try
    {
        blogPostContent = client.DownloadString (blogPostUrl);
    }

    catch (Exception ex)
    {
        Term.PrintLn ("Unable to download\n{0}", ex.Message);
    }

    if (blogPostContent != null) 
    {
        writer.WriteLine (blogPostContent);
    } 

    else
    {
        Term.PrintLn ("No content found");
    }
}

我知道这种方法过于简单，但我想知道为什么我无法从某些URL中提取内容，就像它们有块或类似内容。如何检测网站/博客是否阻止我下载其内容？

Answer 1

网站不会阻止您下载其内容，而不会阻止该网站从浏览器进行咨询。

如果下载失败，则表示：

a）你的网址错了

b）网站需要某种形式的身份证明，而您的请求缺少某些内容（可能是cookie）

如何从博客文章中正确提取内容？

1 个答案: