使用HtmlAgilityPack选择所有“a”节点

时间:2014-05-31 03:38:14

标签: c# windows-runtime html-agility-pack

我在 WinRT 中使用HtmlAgilityPack并尝试将所有<a href="...">节点替换为我想要的节点。

我注意到HtmlAgilityPack在WinRT中改变了节点浏览的方式,因此SelectNode不适用于显示的example

我写的代码如下,但没有运气。

foreach (HtmlNode node in doc.DocumentNode.FirstChild.Element("body").ChildNodes.Where(n => n.Name.Equals("a"))) // Want to find a-nodes in all html tags. SelectNode("//a[@href") doesn't work.
        {
                HtmlAttribute att = node.Attributes.FirstOrDefault(l => l.Name.Equals("href"));
                if (att != null)
                {
                    node.Attributes.Add("onClick", String.Format("gotoLink('{0}');", att.Value));
                    att.Value = "#";
                }
        }

我是否必须在文档层次结构上编写递归导航方法?

2 个答案:

答案 0 :(得分:0)

You can use linq for this as i have used    




protected void ClickMeButton_Click(object sender, EventArgs e)
        {
            HtmlNode dt;
            DataSet sampleDataSet = new DataSet();
            sampleDataSet.Locale = CultureInfo.InvariantCulture;
            DataTable sampleDataTable = sampleDataSet.Tables.Add("SampleData");
            DataTable ErrorDataTable = sampleDataSet.Tables.Add("ErrorData");
            DataRow sampleDataRow;
            DataRow sampleErrorRow;
            var inputPath = "";
            //int numberSelected = lstAddItems.SelectedItems.Count;

            //Add path one by one:
            int flag = 0;
            ArrayList ar = new ArrayList();
            ArrayList storeIndex = new ArrayList();

            using (WebClient client = new WebClient())
            {
                string pixarHtml = client.DownloadString("http://localhost:51450/20001.html");

                HtmlDocument document = new HtmlDocument();
                document.LoadHtml(pixarHtml);

                HtmlNode pixarDiv = (from d in document.DocumentNode.Descendants()
                                     where d.Name == "div" && d.Attributes["id"].Value == "wrapper"
                                     select d).First();

                HtmlNode pixarTable1 = (from d in document.DocumentNode.Descendants()
                                        where d.Name == "div" && d.Attributes["id"].Value == "data-review"
                                        select d).First();

                IEnumerable<HtmlNode> pixarRows = (from d in pixarTable1.Descendants() where d.Name == "dl" && d.Attributes["class"].Value == "clearfix" select d);
                //HtmlNode pixarTable = (from d in pixarDiv.Descendants() where d.Name == "table" select d).First();
                //IEnumerable<HtmlNode> pixarRows = (from d in pixarDiv.Descendants() where d.Name == "dl" && d.Attributes["class"]!=null && d.Attributes["class"].Value == "clearfix" select d);
                //IEnumerable<HtmlNode> columns;
                IEnumerable<HtmlNode> columns;
                String sth = "<table><tr><thead>";
                String std = "<tbody><tr>";

                foreach (HtmlNode row in pixarRows)
                {
                    if (row.ChildNodes != null)
                    {
                        if (row.ChildNodes["dd"] != null)
                        {
                            sth += "<th>" + row.ChildNodes["dt"].InnerText.Trim() + "</th>";
                            if (row.ChildNodes["dt"] != null)
                            {
                                std += "<td>" + row.ChildNodes["dd"].InnerText.Trim() + "</td>";
                            }
                            else
                            {
                                std += "<td></td>";
                            }
                        }
                    }
                }
                std += "</tr></tbody>";
                sth += "</thead></tr>" + std + "</table>";
                var arr = Encoding.ASCII.GetBytes(sth);
                File.WriteAllBytes("D:\\Demo14.xls", arr);
            }
        } 

答案 1 :(得分:0)

您需要在WinRT中使用HtmlAgilityPack的LINQ API,例如:

//following LINQ selector used to replace XPath query : //a[@href]
foreach (HtmlNode node in doc.DocumentNode.Descendants("a").Where(o => "" != o.GetAttributeValue("href", "")))
{
    var att = node.GetAttributeValue("href", "")
    if (att != "")
    {
        node.Attributes.Add("onClick", String.Format("gotoLink('{0}');", att.Value));
        att.Value = "#";
    }
}