xpath匹配错误的节点

时间:2015-06-24 12:10:54

标签: python-2.7 xpath lxml.html

xpath

//*[h1]

在python和Firebug上尝试时显示不同的结果。我的代码:

import requests
from lxml import html

url = "http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/"
resp = requests.get(url)
page = html.fromstring(resp.content)

node = page.xpath("//*[h1]")
print node
#[<Element center at 0x7fb42143c7e0>]

但Firebug匹配<header>标签,这是我想要的。

为什么会这样?如何使我的python代码也匹配<header>

1 个答案:

答案 0 :(得分:1)

您缺少User-Agent标头,因此返回 403 Forbidden 的响应内容,将其添加到请求中并按预期工作:

In [9]: resp = requests.get(url, headers={"User-Agent": "Test Agent"})

In [10]: page = html.fromstring(resp.content)

In [11]: node = page.xpath("//*[h1]")

In [12]: print node
[<Element header at 0x104ff15d0>]