I'm making an applcation in C# with HTMLAgilityPack.
I have the following HTML structure:
<td colspan="3">
<a href="tournament_detail.asp?EID=3">The North West Junior Champions League 2016</a>
<br>
St Bedes Sports Fields, Manchester. M21 0TT</td>
</td>
I would like to pull out the address, excluding the <a>
and the <br />
I have tried the following:
//div[@class='infobox']/table/tr/td[1][not a]
Here is the site I am trying to pull data from
我正在使用HTMLAgilityPack,所以我不相信我可以使用string()函数(或者至少我在尝试时会遇到异常)。 请不要将此标记为重复,因为我正在寻求澄清我是否可以使用它。
如何撤回地址?
答案 0 :(得分:2)
添加谓词[not(a)]
会导致XPath仅返回没有子<td>
的{{1}}元素,这不是想要的结果。相反,添加<a>
将从选定的/text()[normalize-space()]
返回直接子,非空文本节点:
<td>
输出
var raw = @"<td colspan='3'>
<a href='tournament_detail.asp?EID=3'>The North West Junior Champions League 2016</a>
<br>
St Bedes Sports Fields, Manchester. M21 0TT</td>";
var doc = new HtmlDocument();
doc.LoadHtml(raw);
var td = doc.DocumentNode.SelectSingleNode("//td/text()[normalize-space()]");
Console.WriteLine(td.InnerText.Trim());