如何使用python3请求登录和网上抓取“ support.oracle.com”?

时间:2019-07-25 19:57:03

标签: python python-3.x bash request wget

我正在尝试使用python请求对以下提到的URL进行网页抓取,但无法成功。

网址:https://support.oracle.com/rs?type=doc&id=1439822.1

无效代码:

import requests
from bs4 import BeautifulSoup  

s = requests.session()
s.headers.update(headers)


r = s.get("https://support.oracle.com/rs?type=doc&id=1439822.1", auth=('user@email.com', 'mypass'), allow_redirects=True)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify())

预期输出:(通过网络浏览器获得输出,登录成功。实际上需要在命令行的输出下方)

enter image description here

当前输出:(再次显示登录页面) enter image description here

注意:能够通过wget命令实现,但我需要处理python请求。

wget --user "user@email.com" --password "mypass" "https://support.oracle.com/rs?type=doc&id=1439822.1" -O /root/webout.html

感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

最终找到答案了!

import requests
from bs4 import BeautifulSoup

r = requests.get("https://support.oracle.com/rs?type=doc&id=1439822.1", auth=('user@email.com', 'mypass'), allow_redirects=True)

full_fetch = requests.get(r.url, auth=('user@email.com', 'mypass), allow_redirects=True) 
soup = BeautifulSoup(full_fetch.text, 'html.parser')
print(soup.prettify())