Question

我想匹配以下代码的主要段落内容，省略子节点p，div，h3。

<div class="content">
    sunday, monday, tuesday,
    <br>
    <br>
    wednesday, thursday,
    <br>
    friday, saturday
    <div class ="tags">sunday</div>
    <h3>Days</h3>
    <p>....</p>
    <div class="style">monday to friday</div>
</div>

我尝试了//div[@class="content"]/*[not(self::p)]和//div[@class="content"]/*[not(name()="p")]之类的Xpath，但是它们都不起作用。然后我尝试了//div[@class="content"]/node()[not(div)]和//div[@class="content"]/node()[not(h3)]，它仅与第一个文本匹配。

我需要下面的文字

sunday, monday, tuesday,
<br>
<br>
wednesday, thursday,
<br>
friday, saturday

省略了孩子的div class =“ tags”，h3，p，div class = style。

Answer 1

这应该可以解决问题：

//div[@class="content"]/*[not(self::p) and not(self::h3) and not(self::div)]|//div[@class="content"]/text()

docker zipkin

说明：

//div[@class="content"]选择有问题的节点
*[not(self::p) and not(self::h3) and not(self::div)]省略子元素：h3，p，div
（或者，如果您确实需要过滤div class =“ tags”和div class = style，则代替任何div and not(self::div[@class="style"]) and not(self::div[@class="tags"])]。
|//div[@class="content"]/text()然后，加入空白text（）

实际上，这有点复杂。也许您最好只选择文本或在节点上进行一些DOM操作。

Xpath获取省略子节点的主要段落文本

1 个答案: