Question

我是HTML Agility Pack（以及一般基于网络的编程）的新手。我试图提取一行特定的HTML，但是我对HTML Agility Pack的语法了解不多，无法理解我写的不正确的内容（在文档中迷失了方向）。此处的网址已修改。

        string html;
        using (WebClient client = new WebClient())
        {
            html = client.DownloadString("https://google.com/");
        }

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(html);

        foreach (HtmlNode img in doc.DocumentNode.SelectNodes("//div[@class='ngg-gallery-thumbnail-box']//div[@class='ngg-gallery-thumbnail']//a"))
        {
            Debug.Log(img.GetAttributeValue("href", null));
        }

        return null;

这是HTML的外观

<div id="ngg-image-3" class="ngg-gallery-thumbnail-box" >
    <div class="ngg-gallery-thumbnail">
            <a href="https://urlhere.png"
             // More code here
            </a>
    </div>
</div>

问题发生在foreach行上。我已经尽力尝试在线匹配示例，但是却错过了。 TIA。

Answer 1

HTMLAgilityPack使用XPath语法查询节点-HAP有效地将HTML文档转换为XML文档。因此，诀窍是学习有关XPATH查询的信息，以便您可以正确地使用标签和属性的组合，以获得所需的结果。

您粘贴的HTML代码段格式不正确（anchor标签上没有闭合>。假设它是闭合的，那么

//div[@class='ngg-gallery-thumbnail-box']//div[@class='ngg-gallery-thumbnail']//a[@href]

将仅返回具有href属性的那些标签的XPathNodeList。

如果没有一个符合您的条件，则不会写入任何内容。

出于调试目的，也许记录不太具体的查询节点数或OuterXml以查看您得到的信息。

Debug.Log(doc.DocumentNode.SelectNodes("//div[@class='ngg-gallery-thumbnail-box']//div[@class='ngg-gallery-thumbnail'])[0].OuterXml)

HTML Agility Pack节点选择

1 个答案: