我正在尝试从以下站点获取SSL代理:https://hidemyna.me/en/proxy-list/?type=s#list
我尝试过类似的事情:
import urllib.request
import random
from bs4 import BeautifulSoup
import requests
useragents = ["firstUA","secondUA","thirdUA","ecc..."]
try:
req = urllib.request.Request("https://hidemyna.me/en/proxy-list/?type=s#list")
req.add_header("User-Agent", random.choice(useragents))
sourcecode = urllib.request.urlopen(req, timeout=10)
soup = BeautifulSoup(sourcecode)
print (soup.find_all("tr"))
except:
page = requests.get("https://hidemyna.me/en/proxy-list/")
html_contents = page.text
soup = BeautifulSoup(html_contents)
print (soup.find_all("tr"))
但是urrlib.request给我一个错误:urllib.error.HTTPError: HTTP Error 503: Service Temporarily Unavailable
虽然请求能够打印源,但它会打印cloudflare源:
<td align="center" valign="middle">
<div class="cf-browser-verification cf-im-under-attack">
<noscript><h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1></noscript>
<div id="cf-content" style="display:none">
<div>
<div class="bubbles"></div>
<div class="bubbles"></div>
<div class="bubbles"></div>
</div>
<h1><span data-translate="checking_browser">Checking your browser before accessing</span> hidemyna.me.</h1>
<p data-translate="process_is_automatic">This process is automatic. Your browser will redirect to your requested content shortly.</p>
<p data-translate="allow_5_secs">Please allow up to 5 seconds…</p>
</div>
<form action="/cdn-cgi/l/chk_jschl" id="challenge-form" method="get">
<input name="jschl_vc" type="hidden" value="126df403e5b364875d3e8786b495c2f6"/>
<input name="pass" type="hidden" value="1536050987.789-Kq5UmPb6H4"/>
<input id="jschl-answer" name="jschl_answer" type="hidden"/>
</form>
</div>
<div class="attribution">
<a href="https://www.cloudflare.com/5xx-error-landing?utm_source=iuam" style="font-size: 12px;" target="_blank">DDoS protection by Cloudflare</a>
<br/>
Ray ID: 454f3458aad06f6c
</div>
</td>
</tr>]
我该如何绕过?为什么cloudflare将其检测为入侵?