我写了代码,测试了第一位。 (登录网站)但我试图在屏幕上添加部分代码,并且在获取我想要的结果时遇到一些麻烦。当我运行代码时,我得到了#34;无"我不确定是什么导致了这一点。我认为这可能是由于我可能没有正确的属性,它试图刮。
import requests
import urllib2
from bs4 import BeautifulSoup
with requests.session() as c:
url = 'https://signin.acellus.com/SignIn/index.html'
USERNAME = 'My user name'
PASSWORD = 'my password'
c.get(url)
login_data = dict(Name=USERNAME, Psswrd=PASSWORD, next='/')
c.post(url, data=login_data, headers={"Referer": "https://www.acellus.com/"})
page = c.get('https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=326')
quote_page = 'https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=326'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find('div', attrs={'class':'Object7069'})
price = price_box
print price
This is a screenshot of the "inspect element" of the data I want to screen scrape
答案 0 :(得分:0)
我不认为使用request和urllib2登录是个好主意。 python2.x有机械化模块,您可以使用它来登录表单并检索内容。以下是您的代码的外观。
import mechanize
from bs4 import BeautifulSoup
# logging in...
br = mechanize.Browser()
br.set_handle_robots(False)
br.open("https://signin.acellus.com/SignIn/index.html")
br.select_form(nr=0)
br['AcellusID'] = 'your username'
br['Password'] = 'your password'
br.submit()
# parsing required information..
quote_page = 'https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=326'
page = br.open(quote_page).read()
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find('div', attrs={'class':'Object7069'})
price = price_box
print price
参考链接:http://www.pythonforbeginners.com/mechanize/browsing-in-python-with-mechanize/
P.S:mechanize仅适用于python2.x。如果你想使用python3.x,还有其他选项(Installing mechanize for python 3.4)。