我有ASP.NET MVC4项目,尝试使用HtmlAgilityPack解析html文档。我有以下HTML:
<td class="pl22">
<p class='pb10 pt10 t_grey'>Experience:</p>
<p class='bold'>any</p>
</td>
<td class='pb10 pl20'>
<p class='t_grey pb10 pt10'>Education:</p>
<p class='bold'>any</p>
</td>
<td class='pb10 pl20'>
<p class='pb10 pt10 t_grey'>Schedule:</p>
<p class='bold'>part-time</p>
<p class='text_12'>2/2 (day/night)</p>
</td>
我需要获得价值观:
所有我想象的是
HtmlNode experience = hd.DocumentNode.SelectSingleNode("//td[@class='pl22']//p[@class='bold']");
但是它给了我不同的元素,它位于页面顶部。我的经验,教育和时间表是静态价值观。另外,我的任何一个兼职日/夜都是动态值。有人能帮助我吗?
答案 0 :(得分:0)
如果你想保留XPath
,你可以这样做var html = "<td class='pl22'><p class='pb10 pt10 t_grey'>Experience:</p><p class='bold'>any</p></td><td class='pb10 pl20'><p class='t_grey pb10 pt10'>Education:</p><p class='bold'>any</p></td><td class='pb10 pl20'><p class='pb10 pt10 t_grey'>Schedule:</p><p class='bold'>part-time</p><p class='text_12'>2/2 (day/night)</p></td> ";
var doc = new HtmlDocument
{
OptionDefaultStreamEncoding = Encoding.UTF8
};
doc.LoadHtml(html);
var part1 = doc.DocumentNode.SelectSingleNode("//td[@class='pl22']/p[@class='bold']");
var part2 = doc.DocumentNode.SelectNodes("//td[@class='pb10 pl20']/p[@class='bold']");
foreach (var item in part2)
{
Console.WriteLine(item.InnerText);
}
var part3 = doc.DocumentNode.SelectSingleNode("//td[@class='pb10 pl20']/p[@class='text_12']");
Console.WriteLine(part1.InnerText);
Console.WriteLine(part3.InnerText);
输出:
any
part-time
any
2/2 (day/night)
答案 1 :(得分:0)
下面是一个替代方案,它更侧重于表标题(Experience
,Education
和Schedule
),而不是节点类:
private static List<string> GetValues(HtmlDocument doc, string header) {
return doc.DocumentNode.SelectNodes(string.Format("//p[contains(text(), '{0}')]/following-sibling::p", header)).Select(x => x.InnerText).ToList();
}
您可以像这样调用该方法:
var experiences = GetValues(doc, "Experience");
var educations = GetValues(doc, "Education");
var schedules = GetValues(doc, "Schedule");
experiences.ForEach(Console.WriteLine);
educations.ForEach(Console.WriteLine);
schedules.ForEach(Console.WriteLine);