我正在尝试发送表格数据以获取每年账单上的信息。一切都按预期在2019年进行,但是如果我将表单数据“ ctl00 $ rilinContent $ cbYear”更改为上一年,它将仅返回默认搜索页面(默认为2019年),因此不提供任何收集信息。
我尝试使用“ __EVENTTARGET”更改年份,但没有成功,谢谢您可能提供的帮助。
import requests
default_data = {'__EVENTTARGET': '',
'__EVENTARGUMENT': '',
'__LASTFOCUS': '',
'__VIEWSTATE': 'PZZDS...', #(long string)
'__VIEWSTATEGENERATOR': 'B3C16737',
'__EVENTVALIDATION': 'kp03y...', #(long string)
'ctl00$rilinContent$cbYear': '',
'ctl00$rilinContent$txtReport': '',
'ctl00$rilinContent$cbCommittee': '',
'ctl00$rilinContent$comm': 'cbxIn',
'ctl00$rilinContent$cbCategory': '',
'ctl00$rilinContent$cbSponsor': '',
'ctl00$rilinContent$cbxPrime': '',
'ctl00$rilinContent$txtBills': '',
'ctl00$rilinContent$cbxSortNumeric': '',
'ctl00$rilinContent$txtBillFrom': '',
'ctl00$rilinContent$txtBillTo': '',
'ctl00$rilinContent$cbAction': '',
'ctl00$rilinContent$cbxLastAction': '',
'ctl00$rilinContent$cmdReport': 'Enter',
'ctl00$rilinContent$hfQuery': ''}
url = "http://status.rilin.state.ri.us/"
data = default_data
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"}
data['ctl00$rilinContent$cbYear'] = '2019'
data['ctl00$rilinContent$cbCategory'] = '307'
r = requests.post(url, data=data, headers=headers).text
# simple test
string = 'Legislative Status Report'
string in r
答案 0 :(得分:2)
我认为该页面会先通过POST进行初始更新,然后再进行结婚。我敢肯定,以下内容可以简化,但似乎可以解决
import requests
from bs4 import BeautifulSoup as bs
default_data = {'__EVENTTARGET': '',
'__EVENTARGUMENT': '',
'__LASTFOCUS': '',
'__VIEWSTATE': '',
'__VIEWSTATEGENERATOR': 'B3C16737',
'__EVENTVALIDATION': '',
'ctl00$rilinContent$cbYear': '',
'ctl00$rilinContent$txtReport': '',
'ctl00$rilinContent$cbCommittee': '',
'ctl00$rilinContent$comm': 'cbxIn',
'ctl00$rilinContent$cbCategory': '',
'ctl00$rilinContent$cbSponsor': '',
'ctl00$rilinContent$cbxPrime': '',
'ctl00$rilinContent$txtBills': '',
'ctl00$rilinContent$cbxSortNumeric': '',
'ctl00$rilinContent$txtBillFrom': '',
'ctl00$rilinContent$txtBillTo': '',
'ctl00$rilinContent$cbAction': '',
'ctl00$rilinContent$cbxLastAction': '',
'ctl00$rilinContent$cmdReport': '', #'Enter'
'ctl00$rilinContent$hfQuery': ''}
url = "http://status.rilin.state.ri.us/"
data = default_data
headers = {
'User-Agent': 'Mozilla/5.0',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Content-Type' : 'application/x-www-form-urlencoded'
}
data['ctl00$rilinContent$cbYear'] = 2017
with requests.Session() as s:
r = s.get(url)
soup = bs(r.content, 'lxml')
vs = soup.select_one('#__VIEWSTATE')['value']
ev = soup.select_one('#__EVENTVALIDATION')['value']
data['__VIEWSTATE'] = vs
data['__EVENTVALIDATION'] = ev
r = s.post(url, data=data, headers=headers)
soup = bs(r.content, 'lxml')
vs = soup.select_one('#__VIEWSTATE')['value']
ev = soup.select_one('#__EVENTVALIDATION')['value']
data['ctl00$rilinContent$cbCategory'] = 307
data['__VIEWSTATE'] = vs
data['__EVENTVALIDATION'] = ev
data['ctl00$rilinContent$cmdReport'] = 'Enter'
r = s.post(url, data=data, headers=headers)
soup = bs(r.content, 'lxml')
print(soup)