Question

我试过这个

url = 'http://test.ir/'
content = s.get(url).content
tree = html.fromstring(content)
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]/text()[not(self:div)]')]

正如您在图片中看到的，我想要选择部分： enter image description here

当我使用

时

print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]')]

结果显示所选部分和也是<div class="grouptext">的内容。

Answer 1

假设您只想要text()标记首次出现的<div>，则必须在XPath表达式中更具体。您可以通过添加[1]

告诉系统您明确要求第一个系统

print [e.text_content() for e in tree.xpath('//div[@class="grouptext"][1]')]

或者您可以通过过滤style参数来选择它：

print [e.text_content() for e in tree.xpath('//div[@class="grouptext" and @style]')]

你必须决定哪种方式更好。这将取决于<div>标记在更常见的情况下如何显示在XML中。

使用Python中的XPath提取特定HTML元素的值

1 个答案: