我正在尝试最终制作解析特定网站的html的程序,但是我想要使用的网站出现错误的状态行错误。此代码适用于我尝试过的任何其他网站。这是他们故意做的事情,我无能为力吗?
我的代码:
from lxml import html
import requests
webpage = 'http://www.whosampled.com/search/?q=de+la+soul'
page = requests.get(webpage)
tree = html.fromstring(page.text)
我收到的错误讯息:
Traceback (most recent call last):
File "/home/kyle/Documents/web.py", line 6, in <module>
page = requests.get(webpage)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 65, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 49, in request
response = session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 461, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 415, in send
raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', BadStatusLine("''",))
答案 0 :(得分:1)
提供User-Agent
标题,它适合您:
webpage = 'http://www.whosampled.com/search/?q=de+la+soul'
page = requests.get(webpage,
headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'})
证明:
>>> from lxml import html
>>> import requests
>>>
>>> webpage = 'http://www.whosampled.com/search/?q=de+la+soul'
>>> page = requests.get(webpage, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'})
>>> tree = html.fromstring(page.content)
>>> tree.findtext('.//title')
Search Results for "de la soul" | WhoSampled
仅供参考,如果您切换到 https :
,它也会有效>>> webpage = 'https://www.whosampled.com/search/?q=de+la+soul'
>>> page = requests.get(webpage)
>>> tree = html.fromstring(page.content)
>>> tree.findtext('.//title')
'Search Results for "de la soul" | WhoSampled'