我正在尝试创建一个脚本,该脚本将提交一个表单并向我返回结果。我可以从URL中提取表单信息,但无法更新表单字段或获得响应。
我目前有:
import requests
from bs4 import BeautifulSoup as bs
url = 'https://dos.elections.myflorida.com/campaign-finance/contributions/'
response = requests.get(url)
soup = bs(response.text)
form_info = soup.find_all('action')
print(form_info[0]['action'])
哪些作品可以返回:
'/cgi-bin/contrib.exe'
此表单应该可以使用默认值提交,因此我尝试:
session = requests.Session()
BASE_URL = 'https://dos.elections.myflorida.com'
headers = {'User-Agent': "Mozilla/5.0" , 'referer' :'{}/campaign-finance/contributions/'.format(BASE_URL)}
data = {'Submit' : 'Submit'}
res = session.post( '{}/cgi-bin/contrib.exe'.format(BASE_URL), data = data, headers = headers )
我得到502响应。由于this post.
,我使用了引荐来源网址形式https://dos.elections.myflorida.com/campaign-finance/contributions/
结果将我重定向到:
https://dos.elections.myflorida.com/cgi-bin/contrib.exe
SIM的解决方案有效,谢谢!
答案 0 :(得分:1)
尝试以下操作以使用默认搜索获取所需的内容:
import requests
from bs4 import BeautifulSoup
link = 'https://dos.elections.myflorida.com/campaign-finance/contributions/'
post_url = 'https://dos.elections.myflorida.com/cgi-bin/contrib.exe'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36'
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
payload['election'] = '20201103-GEN'
payload['search_on'] = '1'
payload['CanNameSrch'] = '2'
payload['office'] = 'All'
payload['party'] = 'All'
payload['ComNameSrch'] = '2'
payload['committee'] = 'All'
payload['namesearch'] = '2'
payload['csort1'] = 'NAM'
payload['csort2'] = 'CAN'
payload['queryformat'] = '2'
r = s.post(post_url,data=payload)
print(r.text)