Question

我写了代码，测试了第一位。（登录网站）但我试图在屏幕上添加部分代码，并且在获取我想要的结果时遇到一些麻烦。当我运行代码时，我得到了＃34;无＆＃34;我不确定是什么导致了这一点。我认为这可能是由于我可能没有正确的属性，它试图刮。

    import requests
import urllib2
from bs4 import BeautifulSoup

with requests.session() as c:
    url = 'https://signin.acellus.com/SignIn/index.html'
    USERNAME = 'My user name'
    PASSWORD = 'my password'
    c.get(url)
    login_data = dict(Name=USERNAME, Psswrd=PASSWORD, next='/')
    c.post(url, data=login_data, headers={"Referer": "https://www.acellus.com/"})
    page = c.get('https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=326')


quote_page = 'https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=326'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find('div', attrs={'class':'Object7069'})
price = price_box
print price

This is a screenshot of the "inspect element" of the data I want to screen scrape

Answer 1

我不认为使用request和urllib2登录是个好主意。 python2.x有机械化模块，您可以使用它来登录表单并检索内容。以下是您的代码的外观。

import mechanize
from bs4 import BeautifulSoup

# logging in...
br = mechanize.Browser()
br.set_handle_robots(False)
br.open("https://signin.acellus.com/SignIn/index.html")
br.select_form(nr=0)
br['AcellusID'] = 'your username'
br['Password'] = 'your password'
br.submit()

# parsing required information..
quote_page = 'https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=326'
page = br.open(quote_page).read()
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find('div', attrs={'class':'Object7069'})
price = price_box
print price

参考链接：http://www.pythonforbeginners.com/mechanize/browsing-in-python-with-mechanize/

P.S：mechanize仅适用于python2.x。如果你想使用python3.x，还有其他选项（Installing mechanize for python 3.4）。

试图找到正确的变量进行屏幕抓取。

1 个答案: