无法使用beautifulSoup废弃网站

时间:2018-05-24 04:50:06

标签: python web-scraping beautifulsoup python-requests

我尝试使用漂亮的汤(bs4)废弃页面,但我在删除数据时遇到问题,我甚至提到了这个答案中指出的标题Stackoverflow Question 这是我的代码

from bs4 import BeautifulSoup
import requests
headers = {
'Referer': 'hello',
 }
 r=requests.get
 ('https://www.doamin.com/bangalore/restaurants',headers=headers)
 print(r.status_code)

这是我得到的错误

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

和这个

 raise RemoteDisconnected("Remote end closed connection without"
 http.client.RemoteDisconnected: Remote end closed connection without 
 response

我甚至尝试过使用

import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)

但仍然得到同样的错误!

任何人都可以帮助我吗?

2 个答案:

答案 0 :(得分:0)

Zomato(以及许多其他数据收集网站)很可能已实施阻止数据抓取工具或数据挖掘工具的措施。只需使用他们的API:https://developers.zomato.com/api

答案 1 :(得分:0)

我猜服务器通过查看有效Chrome版本列表(如果您在用户代理中指定了Chrome浏览器)更彻底地检查用户代理字符串。您指定的版本(41.0.2228)未列在Chrome version history中。例如,使用41.0.2272:

import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2272.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)