如何在htmlagilitypack中获取具有特定值的元素

时间:2014-03-17 19:19:16

标签: asp.net-mvc-4 html-agility-pack

我有ASP.NET MVC4项目,尝试使用HtmlAgilityPack解析html文档。我有以下HTML:

<td class="pl22">
  <p class='pb10 pt10 t_grey'>Experience:</p>
  <p class='bold'>any</p>
</td>
<td class='pb10 pl20'>
  <p class='t_grey pb10 pt10'>Education:</p>
  <p class='bold'>any</p>
</td>
<td class='pb10 pl20'>
  <p class='pb10 pt10 t_grey'>Schedule:</p>
  <p class='bold'>part-time</p>
  <p class='text_12'>2/2 (day/night)</p>
</td>

我需要获得价值观:

  1. &#34;任何&#34;之后&#34;经历:&#34;
  2. &#34;任何&#34;在&#34;教育:&#34;
  3. 之后
  4. &#34;兼职&#34;,&#34; 2/2(白天/黑夜)&#34;之后&#34;安排:&#34;
  5. 所有我想象的是

    HtmlNode experience = hd.DocumentNode.SelectSingleNode("//td[@class='pl22']//p[@class='bold']");
    

    但是它给了我不同的元素,它位于页面顶部。我的经验,教育和时间表是静态价值观。另外,我的任何一个兼职日/夜都是动态值。有人能帮助我吗?

2 个答案:

答案 0 :(得分:0)

如果你想保留XPath

,你可以这样做
var html = "<td class='pl22'><p class='pb10 pt10 t_grey'>Experience:</p><p class='bold'>any</p></td><td class='pb10 pl20'><p class='t_grey pb10 pt10'>Education:</p><p class='bold'>any</p></td><td class='pb10 pl20'><p class='pb10 pt10 t_grey'>Schedule:</p><p class='bold'>part-time</p><p class='text_12'>2/2 (day/night)</p></td> ";

var doc = new HtmlDocument
{
     OptionDefaultStreamEncoding = Encoding.UTF8
};

doc.LoadHtml(html);

var part1 = doc.DocumentNode.SelectSingleNode("//td[@class='pl22']/p[@class='bold']");
var part2 = doc.DocumentNode.SelectNodes("//td[@class='pb10 pl20']/p[@class='bold']");

foreach (var item in part2)
{
    Console.WriteLine(item.InnerText);
}

var part3 = doc.DocumentNode.SelectSingleNode("//td[@class='pb10 pl20']/p[@class='text_12']");

Console.WriteLine(part1.InnerText);            
Console.WriteLine(part3.InnerText);

输出:

any
part-time
any
2/2 (day/night)

答案 1 :(得分:0)

下面是一个替代方案,它更侧重于表标题(ExperienceEducationSchedule),而不是节点类:

private static List<string> GetValues(HtmlDocument doc, string header) {
    return doc.DocumentNode.SelectNodes(string.Format("//p[contains(text(), '{0}')]/following-sibling::p", header)).Select(x => x.InnerText).ToList();
}

您可以像这样调用该方法:

var experiences = GetValues(doc, "Experience");
var educations = GetValues(doc, "Education");
var schedules = GetValues(doc, "Schedule");

experiences.ForEach(Console.WriteLine);
educations.ForEach(Console.WriteLine);
schedules.ForEach(Console.WriteLine);