我正在尝试抓取一个网站,其HTML中包含以下内容
<form id="__AjaxAntiForgeryForm" action="#" method="post">
<input name="__RequestVerificationToken" type="hidden" value="LOUesP09TLS3suKJk4dF5hIxeo-LmDWLxX8xqwIHYnj-JqR29qDcGA_mtHXvyZIej83qG3FfBBs2nuzk1EY6onTuszY1">
</form>
我正在尝试使用
从BeautifulSoup中提取value
page = urllib2.urlopen(LOGIN_URL)
# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, "html.parser")
form = soup.find("form", {"id": "__AjaxAntiForgeryForm"})
正确返回
<form action="#" id="__AjaxAntiForgeryForm" method="post"><input name="__RequestVerificationToken" type="hidden" value="zd7XHXyVs7EgqObLzIfm9k4bw1cWfcddhfDZ9Mp8TibBaAJUz-yAp1ZBuKS1iJtEAvmI1WG_EYnbmXBnWzuKWJxfl8U1"/></form>
我的问题是从该标记中仅提取值。
我已尝试this回答并使用
form = soup.find("form", {"id": "__AjaxAntiForgeryForm"})['value']
基于this回答,但它只返回KeyError: 'value'
。
我可以将它转换为字符串并使用正则表达式来提取值,但这看起来很笨拙,并且必须使用BeautifulSoup更清晰的方式。
有什么想法吗?
答案 0 :(得分:1)
使用from bs4 import BeautifulSoup
s = """<form id="__AjaxAntiForgeryForm" action="#" method="post">
<input name="__RequestVerificationToken" type="hidden" value="LOUesP09TLS3suKJk4dF5hIxeo-LmDWLxX8xqwIHYnj-JqR29qDcGA_mtHXvyZIej83qG3FfBBs2nuzk1EY6onTuszY1">
</form>"""
soup = BeautifulSoup(s, "html.parser")
form = soup.find("form", {"id": "__AjaxAntiForgeryForm"})
print( form.input.attrs['value'] )
<强>实施例强>
LOUesP09TLS3suKJk4dF5hIxeo-LmDWLxX8xqwIHYnj-JqR29qDcGA_mtHXvyZIej83qG3FfBBs2nuzk1EY6onTuszY1
<强>输出:强>
ScrapingBrowser Browser = new ScrapingBrowser();
WebPage PageResult = Browser.NavigateToPage(new Uri("http://www.example-site.com"));
HtmlNode rawHTML = PageResult.Html;
Console.WriteLine(rawHTML.InnerHtml);
Console.ReadLine();
答案 1 :(得分:0)
from bs4 import BeautifulSoup
html = '''<form id="__AjaxAntiForgeryForm" action="#" method="post">
<input name="__RequestVerificationToken" type="hidden" value="LOUesP09TLS3suKJk4dF5hIxeo-LmDWLxX8xqwIHYnj-JqR29qDcGA_mtHXvyZIej83qG3FfBBs2nuzk1EY6onTuszY1">
</form>'''
soup = BeautifulSoup(html, "html.parser")
value = soup.find('input', {'name':'__RequestVerificationToken'})['value']
print value
答案 2 :(得分:0)
从bs4导入BeautifulSoup
url1 =“ LOGINURL”
soup1 = BeautifulSoup(url1,“ html.parser”)
form1 = soup1.find('input',{'name':'__ RequestVerificationToken'})
print(form1.get('value'))