Question

我正在尝试解析一个HTML文档使用我从这个实际网站找到的一些代码但我不断收到解析错误

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

        // There are various options, set as needed
        htmlDoc.OptionFixNestedTags = true;

        // filePath is a path to a file containing the html
        htmlDoc.Load(@"C:\Documents and Settings\Mine\My Documents\Random.html");

        // Use:  htmlDoc.LoadXML(xmlString);  to load from a string

        // ParseErrors is an ArrayList containing any errors from the Load statement
        if (htmlDoc.ParseErrors != null && htmlDoc.ParseErrors.Count > 0)
        {
            // Handle any parse errors as required
            MessageBox.Show("Oh no");
        }
        else
        {

            if (htmlDoc.DocumentNode != null)
            {
                HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//head");

                if (bodyNode != null)
                {
                    MessageBox.Show("Hello");
                }
            }
        }

任何帮助将不胜感激：）

Answer 1

在野外，HTML可能不符合，不符合要求且无法验证。只有XHTML或非常简单的HTML才能填充ParseErrors。我注意到HTML Agility Pack非常强大，即使生成了ParseErrors，它仍然可以从大多数HTML源构建一个不错的DOM树。删除else，让else块正常执行。

如果它没有构建DOM树，那么您应该调查生成的ParseError。如果它只构建了一个部分树，请尝试在节点上进行递归，打印或消息框以查看DOM树的哪些部分是否已构建。你可能不需要整棵树。

使用HTML Agility Pack时出现编码错误

1 个答案: