我有这个html
<div style="padding-top: 10px;" id="government_funding">
<h2>Sampling of Recent Funding Actions/Set Asides</h2>
<p style="font-style: italic; font-size: .8em;">In order by amount of set aside monies.</p>
<ul>
<li><span style="color: green;">$14,450</span> - Thursday the 17th of August 2017<br><span style="font-weight: bold; font-size: 1.2em;">National Institutes Of Health</span> <br> NATIONAL INSTITUTES OF HEALTH NICHD<br>AVANTI POLAR LIPIDS:1109394 [17-010744]
<hr>
</li>
<li><span style="color: green;">$5,455</span> - Thursday the 31st of August 2017<br><span style="font-weight: bold; font-size: 1.2em;">National Institutes Of Health</span> <br> NATIONAL INSTITUTES OF HEALTH NICHD<br>AVANTI POLAR LIPIDS:1109394 [17-004567]
<hr>
</li>
<li><span style="color: green;">$5,005</span> - Tuesday the 8th of August 2017<br><span style="font-weight: bold; font-size: 1.2em;">National Institutes Of Health</span> <br> NATIONAL INSTITUTES OF HEALTH NIAID<br>CUSTOM LIPID SYNTHESIS (24:0-10:0 PE) 100 MG PACKAGED IN 10-10MG VIALS POWDER PER QUOTE #DQ-000665
<hr>
</li>
<li><span style="color: green;">$5,005</span> - Thursday the 17th of August 2017<br><span style="font-weight: bold; font-size: 1.2em;">National Institutes Of Health</span> <br> NATIONAL INSTITUTES OF HEALTH NIAID<br>CUSTOM LIPID SYNTHESIS (24:0-10:0 PE) 100 MG PACKAGED IN 10-10MG VIALS POWDER PER QUOTE #DQ-000665
<hr>
</li>
</ul>
</div>
我目前正在使用此脚本来检索span标签中的文本
def all_data(d): a,b = [d.find_all('span')中i的i.text]
返回[a,* re.findall('\ w + \ sthe \ s \ w + \ sof \ s \ w + \ s \ d +',d.text),b]
fundresults = [all_data(b) for b in businessesoup.find('div', {'id':'government_funding'}).find_all('li')]
for fundingItem in fundresults:
fundingPrice = fundingItem[0]
fundingDate = fundingItem[1]
fundingAgency = fundingItem[2]
这可行,但是我找不到从html提取文本的最后两行的方法。例如从第一个li提取文本
美国国立卫生研究院 AVANTI POLAR LIPIDS:1109394 [17-010744]
如何提取不在span标签中的文本?