我正在尝试访问
https://www.exploit-db.com/remote
使用python的请求模块,但是没有从页面获取响应。我想访问上面的所有链接。
mfun():
response = requests.get('https://www.exploit-db.com/remote',verify=False)
print(response.text)
soup = bs4.BeautifulSoup(response.text)
return [a.attrs.get('href') for a in soup.select('a[href^=/download/]')]
main():
urls = myfun();
for url in urls:
response = requests.get(url)
print(response.text)
我得到回应:
C:\Python27\requests\packages\urllib3\connectionpool.py:791: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
InsecureRequestWarning)
答案 0 :(得分:2)
该网站使用防火墙来查找“脚本化”的内容。访问。可以通过设置User-Agent
标题来解决它;价值Mozilla/5.0
似乎已足够:
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.exploit-db.com/remote', headers=headers, verify=False)
请注意,生成的页面没有以download
为前缀的网址;只有https://www.exploit-db.com/download
。您可以调整^=
前缀匹配,也可以改为使用*=download
。