Question

我正在尝试使用以下方法获取页面的源代码：

import urllib2
url="http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560"
page =urllib2.urlopen(url)
data=page.read()
print data

并使用user_agent(headers) 我没有成功获取页面的源代码！

你们有什么想法可以做些什么吗？在此先感谢

Answer 1

我尝试过并且请求有效，但您收到的内容表明您的浏览器必须接受Cookie（法语）。你可以用urllib2解决这个问题，但我认为最简单的方法是使用requests lib（如果你不介意有额外的依赖）。

安装requests：

pip install requests

然后在你的剧本中：

import requests

url = 'http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560'

response = requests.get(url)
print(response.content)

我非常确定页面的源代码将是您所期望的。

Answer 2

requests图书馆为我工作，Martin Maillard显示。

另外在另一个帖子中，我注意到leoluk here：

编辑：现在是2014年，大多数重要的图书馆已经存在如果可以的话，你应该使用Python 3。 python-requests是一个非常好的高级库，更容易使用比urllib2。

所以我写了这个get_page程序：

import requests
def get_page (website_url):
    response = requests.get(website_url)
    return response.content

print get_page('http://example.com')

干杯！

Answer 3

我尝试过很多东西，“urllib”“urllib2”以及其他许多东西，但有一件事对我来说对我所需要的一切都有用，并解决了我遇到的任何问题。它是Mechanize。这个库模拟使用真正的浏览器，因此它处理了该领域的许多问题。

无法在python中获取页面源代码

3 个答案: