META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" /> TITLE>Microsoft Corporation META http-equiv="PICS-Label" content="(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0))" /> META NAME="KEYWORDS" CONTENT="products; headlines; downloads; news; Web site; what's new; solutions; services; software; contests; corporate news;" /> META NAME="DESCRIPTION" CONTENT="The entry page to Microsoft's Web site. Find software, solutions, answers, support, and Microsoft news." /> META NAME="MS.LOCALE" CONTENT="EN-US" /> META NAME="CATEGORY" CONTENT="home page" />
我想知道使用HTML Agility Pack获取Category元标记的Content属性值所需的XPATH。 (我删除了html代码中每行的第一个<所以它会发布)。
答案 0 :(得分:14)
很长一段时间HtmlAgilityPack didn't had the ability to directly query an attribute value。您必须遍历元节点列表。这是一种方式 -
var doc = new HtmlDocument();
doc.LoadHtml(htmlString);
var list = doc.DocumentNode.SelectNodes("//meta");
foreach (var node in list)
{
string content = node.GetAttributeValue("content", "");
}
但看起来有一个experimental xpath release会让你这样做。
doc.Document.SelectNodes("//meta/@content")
将返回HtmlAttribute对象列表。
答案 1 :(得分:3)
感谢您快速回复Rohit Agarwal(我看到它仅在我询问后几个小时回答,但直到今天才能测试。)
我最初实现了你的建议如下(它在vb.net中)
Dim result As String = webClient.DownloadString(url)
Dim doc As New HtmlDocument()
doc.LoadHtml(result)
Dim list = doc.DocumentNode.SelectNodes("//meta")
Dim node As Object
For Each node In list
Dim metaname As String = node.GetAttributeValue("name", String.Empty)
If metaname <> String.Empty Then
If (metaname = "title") Then
title = node.GetAttributeValue("content", String.Empty)
//more elseif thens
End if
End if
Next (node)
Dim list = doc.DocumentNode.SelectNodes("//meta")
Dim node As Object
For Each node In list
Dim metaname As String = node.GetAttributeValue("name", String.Empty)
If metaname <> String.Empty Then
If (metaname = "title") Then
title = node.GetAttributeValue("content", String.Empty)
//more elseif thens
End if
End if
Next (node)
但是,我发现// meta [@ name ='title']会给我相同的结果
Dim result As String = webClient.DownloadString(url)
Dim doc As New HtmlDocument()
doc.LoadHtml(result)
感谢您让我走上正确的轨道= D
答案 2 :(得分:2)
如果您只想让元标记显示标题,描述和关键字,请使用
if (metaTags != null)
{
foreach (var tag in metaTags)
{
if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null))
{
Panel divPage = new Panel();
divPage.InnerHtml = divPage.InnerHtml + "<br /> " +
"<b> Page " + tag.Attributes["name"].Value + " </b>: " +
tag.Attributes["content"].Value + "<br />";
}
}
}
如果您想从链接中获取og:tags
,请在此之后添加此代码
if ((tag.Attributes["property"] != null) & (tag.Attributes["content"] != null))
{
if (tag.Attributes["property"].Value == "og:image")
{
img.ImageUrl = tag.Attributes["content"].Value;
}
}
这是很棒的经历...我喜欢这个代码
答案 3 :(得分:1)
没有错误检查:
doc.DocumentNode.SelectSingleNode("//meta[@name='description']").Attributes["content"].Value;
当然,如果Node为Null,或者内容属性不存在,则会产生问题。