Question

我正在使用lxml进行HTML屏幕抓取，我需要按text()选择一个元素，方式与what is done on another question with pure XML类似，但不管发生什么事我都是获得无效的谓词错误。我把它简化为这个例子：

import lxml.html
sample_html = "<div><h2>test string</h2><h2>other string</h2></div>"
sample_tree = lxml.html.fromstring(sample_html)
sample_tree.findall('.//h2[text()="test string"]')

虽然这应该是有效的，但我不断得到错误：

  File "<string>", line unknown
SyntaxError: invalid predicate

解析HTML时如何正确获取lxml以按text()选择元素的任何提示？

Answer 1

表达式本身有效，但您必须使用.xpath()方法：

sample_tree.xpath('.//h2[text()="text string"]')

请注意，在这种情况下您也可以使用. in place of text()：

.//h2[. = "text string"]

使用text（）时lxml无效谓词

1 个答案: