我想带一个代理页面。我进入cfscrapy页面,并通过Cloudflare(第一个“挑战”),然后页面要求我reCAPTCHA知道我是否是人类。这是问题所在,我想我需要传递用户代理和cookie(可能是我发生了代码错误),我不知道该怎么做。
link = "https://www.oneblockdown.it/en/footwear-sneakers/adidas/men-unisex/adidas-originals-yeezy-boost-350-v2/9438"
proxies = get_proxy(proxy_list) #I get proxies from a file...
scraper = cfscrape.create_scraper() # returns a CloudflareScraper instance
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"
}
try:
if(use_proxies):
print("[Proxy]: " + proxies['http'])
r = scraper.get(link, proxies=proxies)
except:
print("Connection to URL <" + link + "> failed.")
return
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify())
最后一次打印的响应是这样的:
'''
<script src="https://www.google.com/recaptcha/api.js?hl=" type="text/javascript">
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.js" type="text/javascript">
</script>
</head>
<body>
<div class="g-recaptcha" data-callback="getCaptchaResult" data-sitekey="6Le49hgUAAAAAIv3wrILeXIrOSdM3_5oxK4C6m48" data-size="invisible">
</div>
<script type="text/javascript">
window.onload = function () { grecaptcha.execute(); };
function getCaptchaResult(response) {
$.post("/index.php", {action: "captcha_verify", captcha: response, version: 37}, function(result){
var timeout = result ? 0 : 2500;
setTimeout(function() {
window.location.reload();
}, timeout);
});
}
</script>
<script type="text/javascript">
window.NREUM||(NREUM={});NREUM.info={"beacon":"bam.nr-data.net","licenseKey":"97b599ea8e","applicationID":"23522071","transactionName":"YFxXbENSCxEFUhVfWlkWdk1CRwoPS1cOWUFAXFRKHEALBwVaBERGGFhRUVVSFg==","queueTime":0,"applicationTime":54,"atts":"TBtUGgtIGB8=","errorBeacon":"bam.nr-data.net","agent":""}
</script>
</body>
</html>
'''
我需要确认我是人。 我该如何应对挑战?