使用HTMLAgilityPack提取特定的HTML文本

时间:2011-09-19 16:54:32

标签: html filter nodes

<table class="result" summary="Summary Description.">
<tbody>
<tr>
    <th scope="col" class="firstcol">Column 1</th>
    <th scope="col">Column 2</th>
    <th scope="col">Column 3</th>
    <th scope="col" class="lastcol">Column 4</th>
</tr>
<tr class="even">
    <td class="firstcol">Text 1</td>
    <td>Text 2</td>
    <td>4Text 3</td>
    <td class="lastcol">Text 4</td>
</tr>
</tbody></table>

HTML Im感兴趣的部分看起来像这样。我想要文本1,文本2,文本3和文本4.使用HTMLAgilityPack,我如何提取该数据?我谷歌并检查了这个网站,但没有找到与我的场景完全匹配的东西。

        if (htmlDoc.DocumentNode != null)
        {
            foreach (HtmlNode text in htmlDoc.DocumentNode.SelectNodes(???)
            {
                ???
            }
        }

1 个答案:

答案 0 :(得分:1)

试试这个:

        var html = @"<table class=""result"" summary=""Summary Description.""> <tbody> <tr>     <th scope=""col"" class=""firstcol"">Column 1</th>     <th scope=""col"">Column 2</th>     <th scope=""col"">Column 3</th>     <th scope=""col"" class=""lastcol"">Column 4</th> </tr> <tr class=""even"">     <td class=""firstcol"">Text 1</td>     <td>Text 2</td>     <td>4Text 3</td>     <td class=""lastcol"">Text 4</td> </tr> </tbody></table>";
        var doc = new HtmlDocument();
        doc.LoadHtml(html);
        var textNodes = doc.DocumentNode.SelectNodes(@"//tr[@class='even']/td/text()").ToList();
        foreach(var textNode in textNodes)
        {
            Console.WriteLine(textNode.InnerText);
        }