Question

我正在使用这个简单的代码。

<div id="post_message_975824" class="alt3">
   <div class="quote">
      some unwanted text 
   </div>
   the text to get <abr>ABR</abr> text to get
</div>

我想让这个有用：

xpath = "//*[contains(@id, 'post_message_') and not(contains(@class,'quote'))]"

但这失败了。我试图使用另一个查询但不确定我做错了什么？

修改

我发现他的代码有效： xpath = "//*[contains(@id,'post_message_')//div[not(contains(@class,'quote'))]"

但是当html中没有引用子类时，它不会选择所需的文本。

我们的想法是从所有子节点获取所有文本，但不从那些受限制的文本中获取。

Answer 1

试试这个xpath：

//div[contains(@id,'post_message_')]/text() | //div[contains(@id,'post_message_')]/*[not(contains(@class,'quote'))]/text()

xpath //div[contains(@id,'post_message_')]/text()的第一部分提供了父div下的文字，即<div id="post_message_975824" class="alt3">

xpath //div[contains(@id,'post_message_')]/*[not(contains(@class,'quote'))]/text()的第二部分仅在子项不包含名为class且值为quote

的属性时才在其所有子节点下提供文本

您的示例的结果是：

   the text to get 
ABR
 text to get

Answer 2

为什么不删除所有不需要的节点？

imageio.help("GIF")

如何在xpath

2 个答案: