自动网站使用python的机械化登录

时间:2012-03-06 12:08:33

标签: python automation web-scraping mechanize

我正在尝试自动登录到其登录表单具有以下HTML代码的网站(摘录):

<tr>
  <td width="60%">
    <input type="text" name="username" class="required black_text" maxlength="50" value="" />
  </td>
  <td>
    <input type="password" name="password" id="password" class="required black_text" maxlength="50" value="" />
  </td>
  <td colspan="2" align="center">
    <input type="image" src="gifs/login.jpg" name="Login2" value="Login" alt="Login" title="Login"/>
  </td>
</tr>

我使用python的mechanize模块进行网页浏览。以下是代码:

br.select_form(predicate=self.__form_with_fields("username", "password"))
br['username'] = self.config['COMMON.USER']
br['password'] = self.config['COMMON.PASSWORD']

try:
    request  = br.click(name='Login2', type='image')
    response = mechanize.urlopen(request)
    print response.read()

except IOError, err:
    logger = logging.getLogger(__name__)
    logger.error(str(err))
    logger.debug(response.info())
    print str(err)
    sys.exit(1)

def __form_with_fields(self, *fields):
    """ Generator of form predicate functions. """
    def __pred(form):
        for field_name in fields:
            try:
                form.find_control(field_name)
            except ControlNotFoundError, err:
                logger = logging.getLogger(__name__)
                logger.error(str(err))
                return False
            return True
    return __pred

不确定我做错了什么......

由于

2 个答案:

答案 0 :(得分:1)

网站可能会使用java-script在登录期间进行回发。我记得很清楚,对于ASP .Net网站,你需要掌握HIDDEN FORM字段,如 VIEWSTATE EVENTTARGET ,并将它们发布到新的页面。  为什么不发送问题中的网站链接?在此之后弄清楚变得相对容易

答案 1 :(得分:0)

尝试使用SeleniumPhantomJS

from selenium import PhantomJS
import platform



if platform.system() == 'Windows':      # .exe for Windows
    PhantomJS_path = './phantomjs.exe'
else:
    PhantomJS_path = './phantomjs'

service_args = [                        # Proxy (optional)
    '--proxy=<>',
    '--proxy-type=http',
    '--ignore-ssl-errors=true',
    '--web-security=false'
    ]

browser = PhantomJS(PhantomJS_path, service_args=service_args)
browser.set_window_size(1280, 720)      # Window size for screenshot (optional)
login_url = "<url_here>"

# Credentials
Username = "<insert>"
Password = "<insert>"



# Login
browser.get(login_url)
browser.save_screenshot('login.png')
print browser.current_url
browser.find_element_by_id("<username field id>").send_keys(Username)
browser.find_element_by_id("<password field id>").send_keys(Password)
browser.find_element_by_id("<login button id>").click()

print (browser.current_url)
browser.get(scrape_url)
print browser.page_source


browser.quit()