Question

这是页面的DOM，

<html>
    <head>
    <body>
        <div id="Content">
        Take/O a/O look/O at/O the/O section/O about/O filling/O in/O forms/O   using/O
        <div id="Footer">
    </body>
</html>

我希望在<div id="Content">之后和页面正文<div id="Footer">之前访问不在任何标记下的文字。

我试过了：

drv.findElement(( By.xpath("//html/body"))).getText();但这会在body标签下的页面中给我全文。
drv.findElement(( By.xpath("//html/body/data"))) //错误无法找到元素

现在可以使用以下前面的xpath选项，因为我怀疑这还会在页面中查找标记吗？

Answer 1

根据你的措辞，我认为你实际上是说这是你的html代码，带有封闭的head和div标签：

<html>
    <head></head>
    <body>
        <div id="Content"></div>
        Take/O a/O look/O at/O the/O section/O about/O filling/O in/O forms/O   using/O
        <div id="Footer"></div>
    </body>
</html>

在这种情况下，这个问题的答案正是您所寻找的：How to get text of an element in Selenium WebDriver (via the Python api) without including child element text?

Answer 2

这是使用Java Strings的粗略解决方案。

// get the page source 
String pageSource = driver.getPageSource();

// split the pafe source into 2. temp[0] will contain the page source
// before <div id="Content"> and temp[1] will contain page source after 
String[] temp1 = pageSource.split("<div id=\"Content\">");

// get the required text by splitting the temp1[1]
String[] temp2 = temp1[1].split("<div id=\"Footer\">");

// required text will be contained in the temp2[0]
String requiredText = temp2[0];

此解决方案尚未完成。如果没有看到整个DOM，我就无法提供准确的代码。但我认为你明白了。

访问没有标记的正文中的特定文本

2 个答案: