Question

我正在尝试将文本从标记中拉出，该标记跟在我开始使用的元素之后。 HTML看起来像这样，具有相同结构的多个条目：

<h5>
    <a href="link">Title</a>
</h5>
<div class="author">
    <p>"Author A, Author B"</p>
</div>
<div id="abstract-more#####" class="collapse">
  <p>
    <strong>Abstract:</strong>
    "Text here..."
  </p>
  <p>...</p>

因此，一旦我隔离了给定的标题元素/节点（存储为“纸张”），我想存储作者和摘要文本。当我用它来吸引作者时，它会起作用：

author = paper.find_element_by_xpath("./following::div[contains(@class, 'author')]/p").text

但是当我使用此命令时，它会为“抽象”返回空白输出：

abstract = paper.find_element_by_xpath("./following::div[contains(@id, 'abstract-more')]/p").text

为什么它对作者有效，但对摘要却无效？我尝试使用.//代替./和其他一些细微调整，但无济于事。我也不知道为什么它没有给出错误并说它找不到抽象元素，而是只返回一个空白...

Answer 1

尝试一下：

//div[contains(@id, 'abstract-more')]/p[1]

Answer 2

请在xpath中使用starts-with而不是contains。

XPath： .//div[starts-with(@id, 'abstract-more')]/p"

abstract = paper.find_element_by_xpath(".//div[starts-with(@id, 'abstract-more')]/p").text

Answer 3

您可以尝试以下xpath：

//div[@class="author"]/following-sibling::div[contains(@id,'abstract-more')]/p[1]

使用代码：

author = paper.find_element_by_xpath("//div[@class="author"]/following-sibling::div[contains(@id,'abstract-more'')]/p[1]")  
print(author.text)

XPath传递空白文本

3 个答案: