带表单数据的POST请求仅在当年有效

时间:2019-05-27 09:06:56

标签: python web-scraping python-requests

我正在尝试发送表格数据以获取每年账单上的信息。一切都按预期在2019年进行,但是如果我将表单数据“ ctl00 $ rilinContent $ cbYear”更改为上一年,它将仅返回默认搜索页面(默认为2019年),因此不提供任何收集信息。

我尝试使用“ __EVENTTARGET”更改年份,但没有成功,谢谢您可能提供的帮助。

示例代码

import requests

default_data = {'__EVENTTARGET': '',
            '__EVENTARGUMENT': '',
            '__LASTFOCUS': '',
            '__VIEWSTATE': 'PZZDS...', #(long string)
            '__VIEWSTATEGENERATOR': 'B3C16737',
            '__EVENTVALIDATION': 'kp03y...', #(long string)
            'ctl00$rilinContent$cbYear': '',
            'ctl00$rilinContent$txtReport': '',
            'ctl00$rilinContent$cbCommittee': '',
            'ctl00$rilinContent$comm': 'cbxIn',
            'ctl00$rilinContent$cbCategory': '',
            'ctl00$rilinContent$cbSponsor': '',
            'ctl00$rilinContent$cbxPrime': '',
            'ctl00$rilinContent$txtBills': '',
            'ctl00$rilinContent$cbxSortNumeric': '',
            'ctl00$rilinContent$txtBillFrom': '',
            'ctl00$rilinContent$txtBillTo': '',
            'ctl00$rilinContent$cbAction': '',
            'ctl00$rilinContent$cbxLastAction': '',
            'ctl00$rilinContent$cmdReport': 'Enter',
            'ctl00$rilinContent$hfQuery': ''}

url = "http://status.rilin.state.ri.us/"
data = default_data

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"}

data['ctl00$rilinContent$cbYear'] = '2019'
data['ctl00$rilinContent$cbCategory'] = '307'

r = requests.post(url, data=data, headers=headers).text

# simple test
string = 'Legislative Status Report'
string in r

1 个答案:

答案 0 :(得分:2)

我认为该页面会先通过POST进行初始更新,然后再进行结婚。我敢肯定,以下内容可以简化,但似乎可以解决

import requests
from bs4 import BeautifulSoup as bs

default_data = {'__EVENTTARGET': '',
            '__EVENTARGUMENT': '',
            '__LASTFOCUS': '',
            '__VIEWSTATE': '', 
            '__VIEWSTATEGENERATOR': 'B3C16737',
            '__EVENTVALIDATION': '',
            'ctl00$rilinContent$cbYear': '',
            'ctl00$rilinContent$txtReport': '',
            'ctl00$rilinContent$cbCommittee': '',
            'ctl00$rilinContent$comm': 'cbxIn',
            'ctl00$rilinContent$cbCategory': '',
            'ctl00$rilinContent$cbSponsor': '',
            'ctl00$rilinContent$cbxPrime': '',
            'ctl00$rilinContent$txtBills': '',
            'ctl00$rilinContent$cbxSortNumeric': '',
            'ctl00$rilinContent$txtBillFrom': '',
            'ctl00$rilinContent$txtBillTo': '',
            'ctl00$rilinContent$cbAction': '',
            'ctl00$rilinContent$cbxLastAction': '',
            'ctl00$rilinContent$cmdReport': '', #'Enter'
            'ctl00$rilinContent$hfQuery': ''}

url = "http://status.rilin.state.ri.us/"
data = default_data

headers = {
    'User-Agent': 'Mozilla/5.0',
    'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Content-Type' : 'application/x-www-form-urlencoded'
}

data['ctl00$rilinContent$cbYear'] = 2017

with requests.Session() as s:
    r = s.get(url)
    soup = bs(r.content, 'lxml')
    vs = soup.select_one('#__VIEWSTATE')['value']
    ev = soup.select_one('#__EVENTVALIDATION')['value']
    data['__VIEWSTATE'] = vs
    data['__EVENTVALIDATION'] = ev
    r = s.post(url, data=data, headers=headers)
    soup = bs(r.content, 'lxml')
    vs = soup.select_one('#__VIEWSTATE')['value']
    ev = soup.select_one('#__EVENTVALIDATION')['value']
    data['ctl00$rilinContent$cbCategory'] = 307
    data['__VIEWSTATE'] = vs
    data['__EVENTVALIDATION'] = ev
    data['ctl00$rilinContent$cmdReport'] = 'Enter'
    r = s.post(url, data=data, headers=headers)
    soup = bs(r.content, 'lxml')
    print(soup)