使用Html Agility Pack和Xpath解析HTML

时间:2014-05-24 05:58:01

标签: c# xpath html-agility-pack

我有以下HTML:这是文本格式的html文件 我是从本地硬盘读取的:

 "<span style=""font-size:14px;""><span style=""""><strong>Description:</strong><br />
  Material:Cotton+Polyester<br />
  Color:White-Black<br />
  Occasion: Casual<br /><br />
<strong>Details&nbsp;in&nbsp;size:</strong></span></span><br />

<div border=""1"" class=""tab02"" style=""border: 1px dashed rgb(204- 204- 204); border-collapse: collapse; border-spacing: 0px; text-align: center; font-size: 14px; font-family: Arial- Helvetica- sans-serif;"" width=""100%"">
<div>
    <div style=""border: 1px dashed rgb(204- 204- 204); border-collapse: collapse; border-spacing: 0px;"">
        <span style=""border: 1px dashed rgb(204- 204- 204); border-collapse: collapse; border-spacing: 0px; padding: 5px 10px;"">
            US Size</span>
        <span style=""border: 1px dashed rgb(204- 204- 204); border-collapse: collapse; border-spacing: 0px; padding: 5px 10px;"">
            M</span>
        <span style=""border: 1px dashed rgb(204- 204- 204); border-collapse: collapse; border-spacing: 0px; padding: 5px 10px;"">
            L</span>
        <span style=""border: 1px dashed rgb(204- 204- 204); border-collapse: collapse; border-spacing: 0px; padding: 5px 10px;"">
            XL</span>
    </div>
    <div style=""border: 1px dashed rgb(204- 204- 204); border-collapse: collapse; border-spacing: 0px;"">
        <span style=""border: 1px dashed rgb(204- 204- 204); border-collapse: collapse; border-spacing: 0px; padding: 5px 10px;"">
            Asian&nbsp;Size</span>
        <span style=""border: 1px dashed rgb(204- 204- 204); border-collapse: collapse; border-spacing: 0px; padding: 5px 10px;"">
            L</span>
        <span style=""border: 1px dashed rgb(204- 204- 204); border-collapse: collapse; border-spacing: 0px; padding: 5px 10px;"">
            XL</span>
        <span style=""border: 1px dashed rgb(204- 204- 204); border-collapse: collapse; border-spacing: 0px; padding: 5px 10px;"">
            2XL</span>
    </div>

我需要使用C#和Xpath获取innerDiv。 这就是我到目前为止所做的:我使用Xpath和

string SizeDescriptions = File.ReadAllText(@"E:\Elance\Product Description     HTML\HTML_Product_Description.txt");
        HtmlDocument document = new HtmlDocument();
        string htmlString = SizeDescriptions;// "<html>blabla</html>";
        document.LoadHtml(htmlString);
        HtmlNodeCollection collection = document.DocumentNode.SelectNodes("//div").FindFirst("div").ChildNodes;
        foreach (HtmlNode link in collection)
        {
            HtmlNodeCollection Sizes = link.SelectNodes("/div/span");
            foreach(HtmlNode SizeDiv in Sizes)
            {
                TableRow tr1 = new TableRow();
                TableCell cell1 = new TableCell();
                tr1.


            }
            string target = link.Attributes["href"].Value;
        }

1 个答案:

答案 0 :(得分:0)

使用

HtmlNodeCollection innerDivs = document.DocumentNode.SelectNodes("//div/div");
foreach (HtmlNode div in innerDivs)
{
    HtmlNodeCollection spans = link.SelectNodes("span");
    foreach(HtmlNode span in spans)
    {
        string text = span.InnerText;


    }

}

当然,如果跨度属于哪个div并不重要,那么只需使用一个XPath和foreach,例如。

HtmlNodeCollection spans = document.DocumentNode.SelectNodes(“// div / div / span”);

foreach(HtmlNode span in spans)
{
    string text = span.InnerText;


}