我有一个很大的html文档,我已经解析了这个保存为output.html的html文档片段
<td width="513">
<b>abc (4pa11cs031) </b><br/><br/><br/><br/><hr/><table><tbody><tr><td><b>Semester:</b></td><td><b>5</b></td><td></td><td> Â Â Â Â <b> Result:Â Â FIRST CLASS </b></td></tr></tbody></table><hr/><br/><table><tbody><tr><td width="250">Subject</td><td align="center" width="60">External </td><td align="center" width="60">Internal</td><td align="center" width="60">Total</td><td align="center" width="60">Result</td></tr><tr><td width="250"><i>Software Engineering (10IS51)</i></td><td align="center" width="60">58</td><td align="center" width="60">24</td><td align="center" width="60">82</td><td align="center" width="60"><b>P</b></td></tr><tr><td width="250"><i>Systems Software (10CS52)</i></td><td align="center" width="60">70</td><td align="center" width="60">24</td><td align="center" width="60">94</td><td align="center" width="60"><b>P</b></td></tr><tr><td width="250"><i>Operating Systems (10CS53)</i></td><td align="center" width="60">58</td><td align="center" width="60">18</td><td align="center" width="60">76</td><td align="center" width="60"><b>P</b></td></tr><tr><td width="250"><i>Database Management Systems (10CS54)</i></td><td align="center" width="60">42</td><td align="center" width="60">25</td><td align="center" width="60">67</td><td align="center" width="60"><b>P</b></td></tr><tr><td width="250"><i>Computer Networks - I (10CS55)</i></td><td align="center" width="60">62</td><td align="center" width="60">23</td><td align="center" width="60">85</td><td align="center" width="60"><b>P</b></td></tr><tr><td width="250"><i>Formal Languages & Automata Theory (10CS56)</i></td><td align="center" width="60">37</td><td align="center" width="60">24</td><td align="center" width="60">61</td><td align="center" width="60"><b>P</b></td></tr><tr><td width="250"><i>Database Applications Laboratory (10CSL57)</i></td><td align="center" width="60">40</td><td align="center" width="60">25</td><td align="center" width="60">65</td><td align="center" width="60"><b>P</b></td></tr><tr><td width="250"><i>Systems Software & Operating Systems Lab. (10CSL58)</i></td><td align="center" width="60">40</td><td align="center" width="60">21</td><td align="center" width="60">61</td><td align="center" width="60"><b>P</b></td></tr></tbody></table><br/><br/><table><tbody><tr><td></td><td></td><td>Total Marks:</td><td> 591 Â Â Â </td></tr></tbody></table> </td>
我正在使用BeautifulSoup来解析和检索表中的值。
def getval():
records=[]
page_html=open('output.html')
soup=BeautifulSoup(page_html)
soup.prettify()
all_tds = [td for td in soup.findAll("b")]
fl = open('output.html', 'wb')
lol=all_tds[0]
record = '%s' % (lol)
fl.write(record)
fl.close()
假设我想要标签之间的所有内容。目前我将其与带有上述代码的标签一起使用
[<b>abc (4pa11cs031) </b>, <b>Semester:</b>, <b>5</b>, <b> Result: FIRST CLASS </b>, <b>P</b>, <b>P</b>, <b>P</b>, <b>P</b>, <b>P</b>, <b>P</b>, <b>P</b>, <b>P</b>]
如何获取标签之间的文字?