我写了以下内容来从表单中删除操作以保存我一次点击一次
import requests
from bs4 import BeautifulSoup
import re
with requests.Session() as c:
url = 'https://website.com/login'
EMAIL = ''
PASSWORD = ''
c.get(url)
login_data = dict(email=EMAIL, password=PASSWORD)
c.post(url, data=login_data, headers={"Referer":
"https://website.com/"})
page = c.get('https://website.com/dashboard')
parser = BeautifulSoup(page.content, 'html.parser')
forms = parser.find('form').get('action')
当我运行这个时,我只从第一个表单中获得结果。 如果我可以迭代这个以获得一个解决方案的所有结果。
我可以将查找更改为
parser.find_all('form')
它将返回所有表单,但不是可用的链接我得到
<form accept-charset="UTF-8" action="https://website.com/action" method="GET">
<input class="button" type="submit" value="action"/>
</form>
它将这些存储在python列表中,因此如果可以迭代这些以删除除url之外的所有内容(它们总是相同的格式,稍微不同的长度url但前后的内容总是相同的。)那是另一个溶液
如果我尝试使用
parser.find_all_next('form').get('action')
我收到以下错误
Traceback (most recent call last):
File "scrape.py", line 16, in <module>
forms = parser.find_all_next('form').get('action')
File "/home/username/.local/lib/python2.7/site-packages/bs4/element.py", line 1807, in __getattr__
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a singleitem. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'get'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
答案 0 :(得分:2)
您只需循环parser.find_all('form')
并获取每个元素的action
属性并将其存储在列表中;这可以使用列表理解来完成。
with requests.Session() as c:
url = 'https://website.com/login'
EMAIL = ''
PASSWORD = ''
c.get(url)
login_data = dict(email=EMAIL, password=PASSWORD)
c.post(url, data=login_data, headers={"Referer":
"https://website.com/"})
page = c.get('https://website.com/dashboard')
parser = BeautifulSoup(page.content, 'html.parser')
forms = [f.get('action') for f in parser.find_all('form')]
所有网址的列表都存储在forms
变量中。