我从python和curl获得了不同的响应,尽管它们都使用完全相同的参数。
Python:
import requests
headers = {
'Accept-Language': 'en-US,en',
'Accept': 'text/html,application/xhtml+xml,application/xml',
'Authority': 'www.google.com',
'User-Agent': 'SomeAgent',
'Upgrade-Insecure-Requests': '1',
}
response = requests.get('https://www.avvo.com', headers=headers)
# Returns a 403 response
卷曲:
import shlex, subprocess
cmd = '''curl -H 'Accept-Language: en-US,en' -H 'Accept: text/html,application/xhtml+xml,application/xml' -H 'Authority: www.google.com' -H 'User-Agent: SomeAgent' -H 'Upgrade-Insecure-Requests: 1' https://www.avvo.com'''
args = shlex.split(cmd)
process = subprocess.Popen(args, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
# Returns a 200 response
两个请求都从同一IP发送。看来这是一个cloudflare问题,cloudflare是否可以检测到来自python请求库与直接curl命令的请求?
我将网站留在代码中,以防它对运行很有帮助。直接是curl命令:
curl -H 'Accept-Language: en-US,en' -H 'Accept: text/html,application/xhtml+xml,application/xml' -H 'Authority: www.google.com' -H 'User-Agent: SomeAgent' -H 'Upgrade-Insecure-Requests: 1' https://www.avvo.com/administrative-law-lawyer/ny.html