Question

我试图创建一个程序，每天从网站上抓取我的学校成绩。然后存储值并为我的成绩创建一个图表，但是当我尝试抓取页面时，我收到的HTML与使用inspect元素获得的HTML不同。

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://ames.usoe-dcs.org/Students/2567")
bsObj = BeautifulSoup(html.read(), 'lxml');
print(bsObj)

inspect元素给了我：http://pastebin.com/BakmpqUM

而python给了我：http://pastebin.com/7gPY1WgB

我认为这是因为我的成绩（https://ames.usoe-dcs.org/Students/2567）的网址是私有的，因此当您在浏览器中输入时，它会在此处返回：https://ames.usoe-dcs.org/Login/?DestinationURL=%2FStudents%2F2566

有没有办法使用python自动登录？

Answer 1

网址不一定是私密的，但是如果请求没有Cookie的网址验证您的状态是用户，则无法获取您在登录时看到的信息。

我建议将Inspect Element打开到网络标签页，然后重新加载包含成绩的页面（登录时）。然后右键单击第一个请求（应该是用HTML回答的GET请求，代码200），将鼠标悬停在副本上，然后单击Copy as cURL command (bash)。然后粘贴到this webpage并复制python。它将为您提供页面的正确请求，包括您在浏览器中访问它们时使用的cookie和验证参数。从那里，您可以解析您的成绩的HTML响应。

你应该有这样的东西来接收和解析你的HTML请求：

cookies = {
   ...stuff...
}
headers = {
   ...stuff...
}

r = requests.get("https://ames.usoe-dcs.org/Students/2567", headers=headers, cookies=cookies)
soup = BeautifulSoup(r.text, "lxml")
grade = soup.find("h1", {"class":"grade"}).contents # Customize to find your grade
print(grade)

Cookie和标题词典来自cURL到Python输出。

网上搜索我的成绩

1 个答案: