如何使用HTML Agility实用程序提取URL的标题,图像和描述

时间:2012-10-31 12:35:23

标签: c# asp.net webforms html-agility-pack

我想提取标题,描述&到目前为止,使用HTML Agility utility来自网址的图片我无法找到易于理解的示例。可以帮我做。

如果有些人可以帮我举例,我会很感激,这样我就可以提取标题,描述&让用户选择从一系列图像中选择图像(当我们共享链接时,某些东西类似于Facebook)。

更新

我在.aspx页面上放置了标题,desc和一个按钮,文本框的标签。我在按钮点击事件上触发代码。但它为所有值返回null。可能是我做错了。

我使用以下示例URLhttp://edition.cnn.com/2012/10/31/world/asia/india/index.html?hpt = hp_t2

protected void btnGetURLDetails_Click(object sender, EventArgs e)
{
    HtmlDocument doc = new HtmlDocument();
    var response = txtURL.Text;
    doc.LoadHtml(response);

    String title = (from x in doc.DocumentNode.Descendants()
                    where x.Name.ToLower() == "title"
                    select x.InnerText).FirstOrDefault();

    String desc = (from x in doc.DocumentNode.Descendants()
                   where x.Name.ToLower() == "description"
                   select x.InnerText).FirstOrDefault();

    List<String> imgs = (from x in doc.DocumentNode.Descendants()
                         where x.Name.ToLower() == "img"
                         select x.Attributes["src"].Value).ToList<String>();

    lblTitle.Text = title;
    lblDescription.Text = desc;
}

上面的代码为我获取所有变量的空值

如果我用这个

修改代码
HtmlDocument doc = new HtmlDocument();
        var url = txtURL.Text;

        var webGet = new HtmlWeb();
         doc = webGet.Load(url);

在这种情况下它只能让我获得标题和价值的价值description再次为null

1 个答案:

答案 0 :(得分:2)

protected void btnGetURLDetails_Click(object sender, EventArgs e)
{
    HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(new Uri(txtURL.Text));
    request.Method = WebRequestMethods.Http.Get;

    HttpWebResponse response = (HttpWebResponse)request.GetResponse();

    StreamReader reader = new StreamReader(response.GetResponseStream());

    String responseString = reader.ReadToEnd();

    response.Close();

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(responseString);

    String title = (from x in doc.DocumentNode.Descendants()
                where x.Name.ToLower() == "title"
                select x.InnerText).FirstOrDefault();

    String desc = (from x in doc.DocumentNode.Descendants()
               where x.Name.ToLower() == "meta"
               && x.Attributes["name"] != null
               && x.Attributes["name"].Value.ToLower() == "description"
               select x.Attributes["content"].Value).FirstOrDefault();

    List<String> imgs = (from x in doc.DocumentNode.Descendants()
                     where x.Name.ToLower() == "img"
                     select x.Attributes["src"].Value).ToList<String>();

   lblTitle.Text = title;
   lblDescription.Text = desc;

}