Question

The documentation about xpath states that if there is no slash in the xpath, the expression will select elements wherever they are

然而，在python中使用lxml.html这样做是行不通的：

import requests
import lxml.html
s = requests.session()
page= s.get('http://lxml.de/')
html = lxml.html.fromstring(page.text)
p=html.xpath('p')

此处p是一个空列表。

我需要使用p=html.xpath('//p')。

任何人都知道为什么？

Answer 1

该页面可能不在<p>（即根），而是<html>，您假定该xpath表达式。

使用双斜杠//p来检索所有<p>元素，或者使用绝对引用向下走去特定<p>。下面演示第一段内容：

p = html.xpath('/html/body/div/p')

print(p[0].text)
# lxml is the most feature-rich
# and easy-to-use library
# for processing XML and HTML
# in the Python language.

等效地：

p = html.xpath('//p')

print(p[0].text)    
# lxml is the most feature-rich
# and easy-to-use library
# for processing XML and HTML
# in the Python language.

解析<p>没有正斜杠，这需要使用搜索路径斜杠的前一个xpath：

div = p = html.xpath('/html/body/div')[0]    
p = div.xpath('p')

print(p[0].text)
# lxml is the most feature-rich
# and easy-to-use library
# for processing XML and HTML
# in the Python language.

python lxml - 选择没有双斜杠的xpath

1 个答案: