Question

我正在寻找一个XPath表达式来获取没有引用部分的文章内容。我希望文章部分中的所有内容都可以使用，直到出现<p>标记，并且其中带有“参考”为止。

//root/main/article/following-sibling::p[.="References"]

<root>
    <main>
        <article>
            <p>
               The stunning increase in homelessness announced in Los Angeles 
               this week — up 16% over last year citywide — was an almost  an 
               incomprehensible conundrum given the nation's booming economy 
               and the hundreds of millions of dollars that city, county and 
               state officials have directed toward the problem.
            </p>
            <p>
                "We cannot let a set of difficult numbers discourage us 
                or weaken our resolve" Garcetti said.
            </p>
            <p>
                References: Maeve Reston, CNN
            </p>
        </article>
    </main>
</root>

我要寻找的结果如下。

<p>
    The stunning increase in homelessness announced in Los Angeles
    this week — up 16% over last year citywide — was an almost  an
    incomprehensible conundrum given the nation's booming economy
    and the hundreds of millions of dollars that city, county and
    state officials have directed toward the problem.
</p>
<p>
    "We cannot let a set of difficult numbers discourage us
    or weaken our resolve" Garcetti said.
</p>

Answer 1

此XPath，

/root/main/article/p[starts-with(normalize-space(),'References')]
                  /preceding-sibling::p

将选择带有“参考”的段落之前的段落。

如果只需要这些/text()元素的文本节点子代，则可以附加p。

节点内内容的XPath表达式，直到遇到带有字符串的节点为止

1 个答案: