卷曲有效但urllib没有

时间:2014-04-28 06:15:38

标签: python parsing curl

每当我卷曲this时,我都能够获得整个网页。但是,当我在Python中使用urllib甚至机械化库时,我得到403 error。有什么理由吗?

2 个答案:

答案 0 :(得分:0)

您可以使用请求lib:

import requests
print requests.get('http://www.economist.com/blogs/schumpeter/2014/04/alstom-block').text

答案 1 :(得分:0)

试试这个,

>>> import urllib2, sys
>>> from BeautifulSoup import BeautifulSoup
>>> site= "http://www.economist.com/blogs/schumpeter/2014/04/alstom-block"
>>> header = {'User-Agent': 'Mozilla/5.0'}
>>> req = urllib2.Request(site,headers=header)
>>> page = urllib2.urlopen(req)
>>> soup = BeautifulSoup(page)
>>> print soup

输出:

    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr" xmlns:og="http://ogp.me/ns#" xmlns:fb="https://www.facebook.com/2008/fbml">
    <head>
....
...
..