每当我卷曲this时,我都能够获得整个网页。但是,当我在Python中使用urllib
甚至机械化库时,我得到403 error
。有什么理由吗?
答案 0 :(得分:0)
您可以使用请求lib:
import requests
print requests.get('http://www.economist.com/blogs/schumpeter/2014/04/alstom-block').text
答案 1 :(得分:0)
试试这个,
>>> import urllib2, sys
>>> from BeautifulSoup import BeautifulSoup
>>> site= "http://www.economist.com/blogs/schumpeter/2014/04/alstom-block"
>>> header = {'User-Agent': 'Mozilla/5.0'}
>>> req = urllib2.Request(site,headers=header)
>>> page = urllib2.urlopen(req)
>>> soup = BeautifulSoup(page)
>>> print soup
输出:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr" xmlns:og="http://ogp.me/ns#" xmlns:fb="https://www.facebook.com/2008/fbml">
<head>
....
...
..