Python使用Login Scrape ASPX页面

时间:2018-02-15 16:59:59

标签: python asp.net web-scraping

我正在尝试使用本网站(http://210.212.227.210/tkmce/index.aspx)中的Python 2.7进行基本的网页抓取,其中包含登录信息。该页面基本上是基于ASPX构建的。我尝试使用以下内容,登录时出现错误。

这是主页链接(http://210.212.227.210),这是我想在登录后请求的重定向链接(http://210.212.227.210/tkmce/Common/Home/Home.aspx

请帮我这个代码。它没有登录!

这些是跟踪登录时的标题和POST数据。

FORMDATA:

__LASTFOCUS:
__EVENTTARGET:
__EVENTARGUMENT:
__VIEWSTATE:/wEPDwUKMTU4MDU0N... (its long)
__VIEWSTATEGENERATOR:2611E4BA
__EVENTVALIDATION:/wEdAAb+Owa/...
txtUserName:(login username)
txtPassword:(my login password)
hdnstatus:0
btnLogin:Login
hdnstatus0:0

请求标题:

Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.9
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:460
Content-Type:application/x-www-form-urlencoded
Cookie:ASP.NET_SessionId=r3ubp0z1x5fhygqj2eqmnqig
Host:210.212.227.210
Origin:http://210.212.227.210
Referer:http://210.212.227.210/tkmce/index.aspx
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36

登录后请求标题

Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.9
Cache-Control:max-age=0
Connection:keep-alive
Cookie:ASP.NET_SessionId=r3ubp0z1x5fhygqj2eqmnqig
Host:210.212.227.210
Referer:http://210.212.227.210/tkmce/index.aspx
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36

使用BeautifulSoup和请求的Python 2.7代码:

import requests
from bs4 import BeautifulSoup

URL="http://210.212.227.210/tkmce/index.aspx"
headers={"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36"}

username="myloginid"
password="myloginpassword"

s=requests.Session()
s.headers.update(headers)
r=s.get(URL)
soup=BeautifulSoup(r.content)

VIEWSTATE=soup.find(id="__VIEWSTATE")['value']
VIEWSTATEGENERATOR=soup.find(id="__VIEWSTATEGENERATOR")['value']
EVENTVALIDATION=soup.find(id="__EVENTVALIDATION")['value']
EVENTTARGET=soup.find(id="__EVENTTARGET")['value']
EVENTARGUEMENT=soup.find(id="__EVENTARGUMENT")['value']

login_data={
"__VIEWSTATE":VIEWSTATE,
"txtUserName":username,
"txtPassword":password,
"__VIEWSTATEGENERATOR" : VIEWSTATEGENERATOR,
"__EVENTVALIDATION":EVENTVALIDATION,
"__EVENTTARGET":EVENTTARGET,
"__EVENTARGUEMENT":EVENTARGUEMENT}

r = s.post(URL, data=login_data)
r = s.get("http://210.212.227.210/tkmce/Common/Home/Home.aspx")
print (r.url)
print (r.text)

1 个答案:

答案 0 :(得分:2)

这将有助于您登录。<​​/ p>

import platform
import time
from selenium import webdriver

if platform.system() == 'Windows':
    PHANTOMJS_PATH = './phantomjs.exe'
else:
    PHANTOMJS_PATH = './phantomjs'

browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.set_window_size(1366, 768)
browser.get("http://210.212.227.210/tkmce/index.aspx")
browser.find_element_by_id("txtUserName").send_keys('170907')
browser.find_element_by_id("txtPassword").send_keys('Caffeine@9')
browser.find_element_by_id("btnLogin").click()
time.sleep(5)
html = browser.page_source
if 'Welcome' in html:
    print("You're logged in!")
else:
    print("Logging in failed. Perhaps, it was attempted with invalid credentials")