Question

我尝试使用漂亮的汤（bs4）废弃页面，但我在删除数据时遇到问题，我甚至提到了这个答案中指出的标题Stackoverflow Question 这是我的代码

from bs4 import BeautifulSoup
import requests
headers = {
'Referer': 'hello',
 }
 r=requests.get
 ('https://www.doamin.com/bangalore/restaurants',headers=headers)
 print(r.status_code)

这是我得到的错误

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

和这个

 raise RemoteDisconnected("Remote end closed connection without"
 http.client.RemoteDisconnected: Remote end closed connection without 
 response

我甚至尝试过使用

import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)

但仍然得到同样的错误！

任何人都可以帮助我吗？

Answer 1

Zomato（以及许多其他数据收集网站）很可能已实施阻止数据抓取工具或数据挖掘工具的措施。只需使用他们的API：https://developers.zomato.com/api

Answer 2

我猜服务器通过查看有效Chrome版本列表（如果您在用户代理中指定了Chrome浏览器）更彻底地检查用户代理字符串。您指定的版本（41.0.2228）未列在Chrome version history中。例如，使用41.0.2272：

import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2272.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)

无法使用beautifulSoup废弃网站

2 个答案: