我是python和beautifulsopu lib的新手。我尝试了很多东西,但没有运气。
我的HTML代码可能是:
<form method = "post" id="FORM1" name="FORM1">
<table cellpadding=0 cellspacing=1 border=0 align="center" bgcolor="#cccccc">
<tr>
<td class="producto"><b>Club</b><br>
<input value="CLUB TENIS DE MESA PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtClub" size="60" maxlength="55">
</td>
<tr>
<td colspan="2" class="producto"><b>Nombre Equipo</b><br>
<input value="C.T.M. PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtNomEqu" size="100" maxlength="80">
</td>
</tr>
<tr>
<td class="producto"><b>Telefono fijo</b><br>
<input value="63097005534" disabled class="txtmascaraform" type="TEXT" name="txtTelf" size="15" maxlength="10">
</td
我只需要取得&lt;“b”&gt;&lt;“/ b”&gt;内的内容。及其“输入值”。
非常感谢!!
答案 0 :(得分:0)
首先find()您的表单按ID,然后find_all()输入内部并获取value
attribute的值:
from bs4 import BeautifulSoup
data = """<form method = "post" id="FORM1" name="FORM1">
<table cellpadding=0 cellspacing=1 border=0 align="center" bgcolor="#cccccc">
<tr>
<td class="producto"><b>Club</b><br>
<input value="CLUB TENIS DE MESA PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtClub" size="60" maxlength="55">
</td>
<tr>
<td colspan="2" class="producto"><b>Nombre Equipo</b><br>
<input value="C.T.M. PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtNomEqu" size="100" maxlength="80">
</td>
</tr>
<tr>
<td class="producto"><b>Telefono fijo</b><br>
<input value="63097005534" disabled class="txtmascaraform" type="TEXT" name="txtTelf" size="15" maxlength="10">
</td>
</tr>
</table>
</form>"""
soup = BeautifulSoup(data)
form = soup.find("form", {'id': "FORM1"})
print [item.get('value') for item in form.find_all('input')]
# UPDATE for getting table cell values
table = form.find("table")
print [item.text.strip() for item in table.find_all('td')]
打印:
['CLUB TENIS DE MESA PORTOBAIL', 'C.T.M. PORTOBAIL', '63097005534']
[u'Club', u'Nombre Equipo', u'Telefono fijo']