Question

我想提取与XPath相匹配的内容：.//*[contains (@class, 'post-content')]但是我希望排除子节点：

1）包含文字：P3或AP

2）包含id = bottom的Div

3）包含带有文字标签的表单：获取电子邮件更新

我有以下HTML：

<div class="td-post-content">
    <p>P1</p>
    <p>P2</p>
    <p>P3</p>
    <p>P4</p>
    <p>P5</p>
    <p>AP</p>
    <div id="td-a-rec bottom"> </div>
    <form action="https://example.com/subscribe method=" post " id="subscribe-form " name="subscribe-form " class="validate " target="_blank " novalidate=" ">
        <div id="signup_scroll ">
            <label for="mce-EMAIL ">Get email updates from..</label>
            <input type="email " value=" " name="EMAIL " class="email " id="EMAIL " placeholder="email address " required=" ">
            <div style="position: absolute; left: -5000px; " aria-hidden="true "><input type="text " name="b_11 " tabindex="-1 " value=" "></div>
            <div class="clear "><input type="submit " value="Subscribe " name="subscribe " id="-subscribe " class="button "></div>
        </div>
    </form>
</div>

我可以通过使用XPath语法来实现此目的：[not(contains(@id,'bottom'))] + [not(contains(text(),'P3'))] + [not(contains(text(),'AP'))]等但是，主要的问题是与其匹配所有所需子元素，元素作为单个元素-现在将每个元素匹配为WebElement List。

目前，提取所需文本的唯一方法是遍历Web元素列表，并将结果串联到单个S enter code here tring中。

是否可以一遍直接刮取所有所需内容（只需一次调用element.getText()），而无需遍历元素列表？

谢谢

Answer 1

根据您的描述，看起来您想要的只是带有几个排除项的P标记中的文本。 CSS选择器div.td-post-content > p将为您提供所有P标签，包括您要排除的标签。您可以将它们收集到一个列表中，然后删除要排除的文本以得到最终列表。

List<WebElement> ps = driver.findElements(By.cssSelector("div.td-post-content > p"));
List<String> text = ps.stream().map(e -> e.getText()).collect(Collectors.toList());
text.remove("AP");
text.remove("P3");
System.out.println(text);

运行此打印

[P1, P2, P3, P4, P5]

只需调用element.getText（），即可从XPath结果中排除某些子节点？

1 个答案: