我想在特定tr中的所有标签之间获取文本。我看过类似的问题,但它们特定于标签类型。
如果我这样做:
for strong_tag in soup.find_all('strong'):
print strong_tag.text
这是针对特定标签,但如何为完整的tr。?
<tr>
<td style="border:0px solid black;padding: 0px 5.4pt;border-color: currentColor windowtext windowtext;border-style: none solid solid;border-width: medium 0pt 0pt;background: white;" width="39">
<p align="center" style="min-height: 8pt; padding: 0px; text-align: center;"> </p>
</td>
<td colspan="7" style="border:0px solid black;vertical-align: top;text-align: left;padding: 0px 5.4pt;border-color: currentColor windowtext windowtext currentColor;border-style: none solid solid none;border-width: medium 0pt 0pt medium;background: white;" width="683">
<ol style="list-style-type: decimal;">
<li>Process the return per standard procedures. Refer to the <a class="jive-link-wiki-small" data-containerid="2456" data-containertype="14" data-objectid="12425" data-objecttype="102" href="https://iconnect.sprint.com/docs/DOC-12425">Sprint Satisfaction Guarantee Procedure</a> for steps.</li>
<li>RMS will reset the eligibility when doing a <strong>Sprint Monthly Installments Return</strong>. If the original transaction was performed in RMS, the system will display a message and advise that a history transaction can be performed or you can proceed with a No History Return</li>
<li>
To reset Monthly Installments upgrade eligibility and process the return:
<ol>
<li>Return the device.</li>
<li>Re-access the account to see if the line is still <strong>upgrade-eligible for Monthly Installments</strong>.</li>
</ol>
<ul>
<ul>
<li><strong>If so,</strong> proceed with the sale as normal.</li>
<li>
If the customer's line is showing as <strong>not upgrade-eligible</strong> for Monthly Installments:
<ol>
<li>Add a note to the customer's account stating the return transaction number and the need for eligibility reset.</li>
<li>Reset the customer's eligibility by using the MSA tablet or through iCare <em><strong>or</strong></em></li>
<li>Contact <strong>NSS</strong> to request an eligibility reset <strong>only</strong> if the reset was <strong>not successful</strong>.<strong> </strong></li>
</ol>
</li>
</ul>
<ul>
<li><span style="font-family: Arial;">Once eligibility is reset, pull up the customer's account again in RMS and process the sale.</span></li>
</ul>
</ul>
</li>
</ol>
</td>
</tr>
预期的输出是: 所有标签之间的文字
答案 0 :(得分:1)
get_text()
获取所有子字符串并使用给定的分隔符
text
是get_text
方法的属性 - 未记录
print(soup.select('tr')[0].text)
使用对齐
import bs4
soup=bs4.BeautifulSoup(open('h.html'),'lxml')
def get_text(i):
r=[]
for t in i.contents:
if type(t)==bs4.element.NavigableString:r.append(t.strip())
elif t.name in ['strong','span'] :r.append(t.text.strip())
return ' '.join(r)
s=soup.select('li',)
for i in s:
level=(len(i.find_parents('ol')+i.find_parents('ul')))-1
print(' '*level*5,get_text(i))
print('-'*50)