我尝试使用漂亮的汤(bs4)废弃页面,但我在删除数据时遇到问题,我甚至提到了这个答案中指出的标题Stackoverflow Question 这是我的代码
from bs4 import BeautifulSoup
import requests
headers = {
'Referer': 'hello',
}
r=requests.get
('https://www.doamin.com/bangalore/restaurants',headers=headers)
print(r.status_code)
这是我得到的错误
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
和这个
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without
response
我甚至尝试过使用
import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)
但仍然得到同样的错误!
任何人都可以帮助我吗?
答案 0 :(得分:0)
Zomato(以及许多其他数据收集网站)很可能已实施阻止数据抓取工具或数据挖掘工具的措施。只需使用他们的API:https://developers.zomato.com/api
答案 1 :(得分:0)
我猜服务器通过查看有效Chrome版本列表(如果您在用户代理中指定了Chrome浏览器)更彻底地检查用户代理字符串。您指定的版本(41.0.2228)未列在Chrome version history中。例如,使用41.0.2272:
import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/41.0.2272.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)