使用python登录并获取HTML文件

时间:2014-05-19 10:19:10

标签: python html login webclient

嘿我正在尝试登录网站并在登录后获取网页的html。并且无法弄清楚如何用python做到这一点。使用python 2.7。需要填写本网站上的html表格:

'user'='magaleast'和'password'='1181'(对我来说无用的真实登录详情)。然后,网站将用户重定向到认证页面,当它完成后,它将转到我需要的页面。

有什么想法吗?

编辑: 尝试这段代码:

from mechanize import Browser
import cookielib
br = Browser()
br.open("http://www.shiftorganizer.com/")


cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)

# You need to spot the name of the form in source code
br.select_form(name = "user")
# Spot the name of the inputs of the form that you want to fill, 
# say "username" and "password"
br.form["user"] = "magaleast"
br.form["password"] = "1181"

response = br.submit()
print response.read()

但我明白了:

     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<title>ShiftOrganizer סידור עבודה בפחות משניה</title>

    <meta http-equiv="content-type" content="text/html; charset=utf-8" />



<script type="text/javascript">

var emptyCompany=1

function subIfNewApp()

{

    if (emptyCompany){

        document.authenticationForm.action = document.getElementById('userName').value + "/authentication.asp"

    } else {

        document.authenticationForm.action = document.getElementById('Company').value + "/authentication.asp"

    }

    document.authenticationForm.submit()

}

</script>

</head>

    <body onload="subIfNewApp()">

    <form name="authenticationForm" method="post" action="">

        <input type="hidden" name="userName" id="userName" value="magaleast" />

        <input type="hidden" name="password" id="password" value="1181" />

        <input type="hidden" name="Company" id="Company" value="שם חברה" />

        </form>

    </body>

</html>

是问题吗?因为它再次在认证部分停止..?

1 个答案:

答案 0 :(得分:0)

网站似乎确实需要一些JS,所以下面的代码是不够的。在这种特殊情况下,通过查看源代码,似乎最后使用了这个URL:

http://shifto.shiftorganizer.com/magaleast/welcome.asp?password=1181 这似乎包含登录后页面的类似信息(尽管我不能读希伯来语,我可能完全错了......)。如果是这样,你可以简单地做:

import urllib
url = 'http://shifto.shiftorganizer.com/*username*/welcome.asp?password=*password*'
print urllib.urlopen(url).read()

有关信息,请登录不需要Javascript的表单的代码。

我会使用mechanize库(还有Requests会工作),做类似

的事情
from mechanize import Browser

br = Browser()
br.set_cookiejar(cookielib.LWPCookieJar())

# Browser options
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)

br.open("your url")

# You need to spot the name of the form in source code
br.select_form(name="form_name")  

# Spot the name of the inputs of the form that you want to fill, 
# say "username" and "password"
br.form["username"] = "magaleast"
br.form["password"] = "1181"

response = br.submit()
print response.read()