您好我试图从网站https://health.usnews.com/doctors/city-index/new-jersey中删除数据。我希望所有城市名称再次从链接中删除数据。但是在python中使用请求库会出错。有一些会话或cookie或某些东西停止抓取数据。请帮帮我。
>>> import requests
>>> url = 'https://health.usnews.com/doctors/city-index/new-jersey'
>>> html_content = requests.get(url)
>>> html_content.status_code
403
>>> html_content.content
'<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don\'t have permission to access "http://health.usnews.com/doctors/city-index/new-jersey" on this server.<P>\nReference #18.7d70b17.1528874823.3fac5589\n</BODY>\n</HTML>\n'
>>>
这是我得到的错误。
答案 0 :(得分:0)
您需要在请求中添加标头,以便该网站认为您是真正的用户。
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
html_content = requests.get(url, headers=headers)
答案 1 :(得分:0)
首先,像之前的回答建议我建议你在代码中添加一个标题,所以你的代码应该是这样的:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Firefox/60.0'}
url = 'https://health.usnews.com/doctors/city-index/new-jersey'
html_content = requests.get(url, headers=headers)
html_content.status_code
print(html_content.text)