我正在使用python 3和scrapy。我正在使用此代码在scrapy shell中获取:
url = "https://www.urban.com.au/projects/melbourne-square-93-119-kavanagh-street-southbank"
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
}
fet = scrapy.Request(url, headers=headers)
fetch(fet)
它显示DEBUG: Crawled (403)
请分享任何想法,并在刮板外壳中返回200响应。
答案 0 :(得分:0)
如果您在浏览器中将其打开,则会显示继续填写验证码。因此,对于来自计算机的高流量,它将要求额外的身份验证。
因此您看到403
答案 1 :(得分:0)
403错误-因为网站显示了验证码。
如果解析验证码并提取Cookie,它将起作用。
requests
用于调试的示例:
import requests
headers = {
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
'cookie': 'your cookie',
}
response = requests.get('https://www.urban.com.au/projects/melbourne-square-93-119-kavanagh-street-southbank', headers=headers)
答案 2 :(得分:-1)
headers = {
'authority': 'www.urban.com.au',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'dnt': '1',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'sec-fetch-site': 'none',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
}
Request('https://www.urban.com.au/projects/melbourne-square-93-119-kavanagh-street-southbank', headers=headers)
您需要模拟与真实浏览器完全相同的标头