Question

META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" />
TITLE>Microsoft Corporation
META http-equiv="PICS-Label" content="(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0))" />
META NAME="KEYWORDS" CONTENT="products; headlines; downloads; news; Web site; what's new; solutions; services; software; contests; corporate news;" />
META NAME="DESCRIPTION" CONTENT="The entry page to Microsoft's Web site. Find software, solutions, answers, support, and Microsoft news." />
META NAME="MS.LOCALE" CONTENT="EN-US" />
META NAME="CATEGORY" CONTENT="home page" />

我想知道使用HTML Agility Pack获取Category元标记的Content属性值所需的XPATH。（我删除了html代码中每行的第一个＆lt;所以它会发布）。

Answer 1

很长一段时间HtmlAgilityPack didn't had the ability to directly query an attribute value。您必须遍历元节点列表。这是一种方式 -

var doc = new HtmlDocument();
doc.LoadHtml(htmlString);

var list = doc.DocumentNode.SelectNodes("//meta"); 
foreach (var node in list)
{
    string content = node.GetAttributeValue("content", "");
}

但看起来有一个experimental xpath release会让你这样做。

doc.Document.SelectNodes("//meta/@content")

将返回HtmlAttribute对象列表。

Answer 2

感谢您快速回复Rohit Agarwal（我看到它仅在我询问后几个小时回答，但直到今天才能测试。）

我最初实现了你的建议如下（它在vb.net中）

Dim result As String = webClient.DownloadString(url) Dim doc As New HtmlDocument() doc.LoadHtml(result)



    Dim list = doc.DocumentNode.SelectNodes("//meta")
    Dim node As Object

    For Each node In list
        Dim metaname As String = node.GetAttributeValue("name", String.Empty)
        If metaname <> String.Empty Then
            If (metaname = "title") Then
                title = node.GetAttributeValue("content", String.Empty)
            //more elseif thens
            End if
        End if
    Next (node)

Dim list = doc.DocumentNode.SelectNodes("//meta") Dim node As Object For Each node In list Dim metaname As String = node.GetAttributeValue("name", String.Empty) If metaname <> String.Empty Then If (metaname = "title") Then title = node.GetAttributeValue("content", String.Empty) //more elseif thens End if End if Next (node)

但是，我发现// meta [@ name ='title']会给我相同的结果


Dim result As String = webClient.DownloadString(url)

Dim doc As New HtmlDocument() doc.LoadHtml(result)

感谢您让我走上正确的轨道= D

Answer 3

如果您只想让元标记显示标题，描述和关键字，请使用

 if (metaTags != null)
        {
            foreach (var tag in metaTags)
            {
                if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null))
                {
                       Panel divPage = new Panel();                        
                       divPage.InnerHtml = divPage.InnerHtml + "<br /> " +
                        "<b> Page " + tag.Attributes["name"].Value + " </b>: " +
                            tag.Attributes["content"].Value + "<br />";
                }
            }
        }

如果您想从链接中获取og:tags，请在此之后添加此代码

            if ((tag.Attributes["property"] != null) & (tag.Attributes["content"] != null))
            {
                if (tag.Attributes["property"].Value == "og:image")
                {
                    img.ImageUrl = tag.Attributes["content"].Value;
                }

            }

这是很棒的经历...我喜欢这个代码

Answer 4

没有错误检查：

doc.DocumentNode.SelectSingleNode("//meta[@name='description']").Attributes["content"].Value;

当然，如果Node为Null，或者内容属性不存在，则会产生问题。

使用XPATH使用HTML Agility Pack获取元标记属性

4 个答案: