Question

我在互联网上看到了很多解决方案，但它们似乎没有用。

我有这段代码从Imdb中的用户检索信息：

from lxml import html
import requests

page = requests.get('http://www.imdb.com/user/ur6447592/comments-expanded?start=0&order=alpha')
tree = html.fromstring(page.content)

result = tree.xpath('//*[@id="outerbody"]/tbody/tr/td/b[2]/text()')

print(result)

输出应为：

["Little flesh and all bones"]

Answer 1

将xpath参数更改为：

'//*[@id="outerbody"]/tr/td/b[2]/text()'

修改：

感谢评论，我刚才意识到为什么OP遇到了这个问题。

您可以打印page.content以查看原始html。（通过@JacobIRR）

或者，在Firefox中，工具 - Web开发人员 - 页面源。

在Google Chrome开发者工具中，引自@corn3lius：

如果您使用网络选项卡并查看返回的文档，它将会在与DOM混淆之前给你原始状态。

lxml返回空列表

1 个答案: