Question

I'm making an applcation in C# with HTMLAgilityPack.

I have the following HTML structure:

<td colspan="3">
    <a href="tournament_detail.asp?EID=3">The North West Junior Champions League 2016</a>
    <br>
    St Bedes Sports Fields,  Manchester. M21 0TT</td>
</td>

I would like to pull out the address, excluding the <a> and the <br />

I have tried the following:

//div[@class='infobox']/table/tr/td[1][not a]

Here is the site I am trying to pull data from

我正在使用HTMLAgilityPack，所以我不相信我可以使用string（）函数（或者至少我在尝试时会遇到异常）。 请不要将此标记为重复，因为我正在寻求澄清我是否可以使用它。

如何撤回地址？

Answer 1

添加谓词[not(a)]会导致XPath仅返回没有子<td>的{{1}}元素，这不是想要的结果。相反，添加<a>将从选定的/text()[normalize-space()]返回直接子，非空文本节点：

<td>

输出

var raw = @"<td colspan='3'>
    <a href='tournament_detail.asp?EID=3'>The North West Junior Champions League 2016</a>
    <br>
    St Bedes Sports Fields,  Manchester. M21 0TT</td>";
var doc = new HtmlDocument();
doc.LoadHtml(raw);
var td = doc.DocumentNode.SelectSingleNode("//td/text()[normalize-space()]");
Console.WriteLine(td.InnerText.Trim());

如何删除<a> elements from xPath?

1 个答案: