Question

有一份文件结构如下：

<div class="document">

    <div class="title">
        <AAA/>
    </div class="title">

    <div class="lead">
        <BBB/>
    </div class="lead">

    <div class="photo">
        <CCC/>
    </div class="photo"> 

    <div class="text">
    <!-- tags in text sections can vary. they can be `div` or `p` or anything. -->
        <DDD>
            <EEE/>
            <DDD/>
            <CCC/>
            <FFF/>
                <FFF>
                    <GGG/>
                </FFF>
        </DDD>
    </div class="text">

    <div class="more_text">
        <DDD>
        <EEE/>
            <DDD/>
            <CCC/>
            <FFF/>
                <FFF>
                    <GGG/>
                </FFF>
        </DDD>
    </div class="more_text">

    <div class="other_stuff">
        <DDD/>
    </div class="other_stuff">

</div class="document">

任务是获取除<div class="lead">元素之外的<div class="other_stuff">和<div class="photo">之间的所有元素。

节点集交集$ns1[count(.|$ns2) = count($ns2)]的Kayessian方法非常有效。将$ns1替换为//*[@class="lead"]/following::*并将$ns2替换为//*[@class="other_stuff"]/preceding::*后，工作代码如下所示：

//*[@class="lead"]/following::*[count(. | //*[@class="other_stuff"]/preceding::*)
= count(//*[@class="other_stuff"]/preceding::*)]/text()

它选择<div class="lead"＆gt;之间的所有内容和<div class="other_stuff"> 包括 <div class="photo">元素。我尝试了几种方法在公式本身中插入not()选择器

//*[@class="lead" and not(@class="photo ")]/following::*
//*[@class="lead"]/following::*[not(@class="photo ")]
//*[@class="lead"]/following::*[not(self::class="photo ")]

（与/preceding::*部分相同）但它们不起作用。看起来这个not()方法被忽略了 - <div class="photo">元素保留在选择中。

问题1： 如何从此交叉点中排除不必要的元素？

选择<div class="photo">元素不能自动将其排除，因为在其他文档中它可以出现在任何位置或根本不显示。

问题2（附加）： 在此*和following::之后使用preceding::是否可以情况下吗

它最初会选择一直到整个文档的结尾和开头。是否可以更好地指定following::和preceding::方式的确切终点？我试过了//*[@class="lead"]/following::[@class="other_stuff"]，但似乎没有用。

Answer 1

问题1：如何从此交叉点中排除不必要的元素？

在这种情况下，将另一个谓词[not(self::div[@class='photo'])]添加到正在运行的XPath中应该这样做。对于这种特殊情况，整个XPath看起来像这样（格式化为可读性）：

//*[@class="lead"]
 /following::*[
    count(. | //*[@class="other_stuff"]/preceding::*) 
        = 
    count(//*[@class="other_stuff"]/preceding::*)
 ][not(self::div[@class='photo'])]
/text()

问题2（附加）：在这种情况下跟随::和前面的::之后使用*是否可以？

我不确定它是否会更好＆＃39;，我能说的是following::[@class="other_stuff"]是无效的表达。您需要提及将应用谓词的元素，例如，＆＃39;任何元素＆＃39; following::*[@class="other_stuff"]，或只是＆＃39; div＆＃39; following::div[@class="other_stuff"]。

使用XPath

1 个答案: