Question

所以，我最近一直在问很多Xpath问题。对不起，但我刚刚开始使用它，而且我正在开发一种很难的项目。你看，我正在解析像这样的HTML（不是复制和粘贴，只是一个例子）：

<span id="no153434"></span>
<blockquote>Text here.<br/>More text.<br/>Some more text.</blockquote>

我正在使用

//span[starts-with(@id, 'no')]/following::*[1][name()='blockquote']//node()

获取内部文字。虽然很令人沮丧，但它工作正常。我需要手动检查
然后手动组合br之前和之后的字符串，添加换行符，依此类推。但它仍然有效。直到文本中有链接，即。然后代码是这样的：

<span id="no153434"></span>
<blockquote>Text here.<br/>Text.<br/><font class = "unkfunc"><a href="linkhere" class="link">linkhere</a></font></blockquote>

我绝对不知道该从哪里开始，因为链接作为一个完全独立的项目（两次）包含在数组中。至少我知道它必须被移动到哪里。在所有这些努力之后，真的考虑放弃这个项目。

Answer 1

您可以使用此XPath获取元素内的文本：//span[starts-with(@id, 'no')]/following::*[1][name()='blockquote']//text()

所以你得到以下结果：

文字在这里。
文本。
linkhere

Answer 2

如果您只想要文本节点和br：

 //span
  [starts-with(@id, 'no')]/
  following::*[1][name()='blockquote']
   //node()
   [ count(.|..//text()) = count(..//text())
     or 
     name()='br'
   ]

返回

Text here.
<br />
Text.
<br />
linkhere

Answer 3

答案是不要将XPath用于此类工作。使用Objective-C-HTML-Parser可以让它更容易工作1,000,000倍。

将blockquotes中的链接转换为纯文本

3 个答案: