我有以下html片段:
<div id="targetdown" class="content">
<div class="alertbox">
<div class="ym-wrapper">
<div class="ym-wbox">
</div>
</div>
</div>
<div class="ym-wrapper">
<div class="ym-wbox">
<p style="text-align: center;">EXCEL Physical Therapy has been keeping our patients moving forward<br />
for nearly 30 years. In the process, we have built an unparalleled<br />
reputation by combining the highest quality of physical therapy<br />
with exceptional customer service to provide a genuinely<br />
“patient first” approach. It is this philosophy that has established<br />
EXCEL as a premier physical therapy provider in Northern New Jersey.</p>
</div>
</div>
</div>
<section class="parallaxone parallax">
<div class="ym-wrapper">
<div class="ym-wbox">
<h2>Helping you navigate the road to recovery</h2>
</div>
</div>
</section>
我想从存在的元素中获取文字,但不要考虑断行时它是一个新元素。
我正在执行以下操作:
'
In [19]: html = '<div id="targetdown" class="content"><div class="alertbox"><div class="ym-wrapper"><div class="ym-wbox"></div></div></div><div class="ym-wrapper"><div class="ym-wbox"><p style="text-align: center;">EXCEL Physical Therapy has been keeping our patients moving forward<br />for nearly 30 years. In the process, we have built an unparalleled<br /> reputation by combining the highest quality of physical therapy<br /> with exceptional customer service to provide a genuinely<br /> “patient first” approach. It is this philosophy that has established<br /> EXCEL as a premier physical therapy provider in Northern New Jersey.</p></div></div></div><section class="parallaxone parallax"><div class="ym-wrapper"><div class="ym-wbox"><h2>Helping you navigate the road to recovery</h2> </div></div></section>
...: soup = BeautifulSoup(html)
...: texts = soup.findAll(text=True)
结果是:
In [20]: texts
Out[20]:
['EXCEL Physical Therapy has been keeping our patients moving forward',
'for nearly 30 years. In the process, we have built an unparalleled',
' reputation\xa0by combining the highest quality of physical therapy',
' with exceptional\xa0customer service to provide a genuinely',
' “patient first” approach.\xa0It is this philosophy\xa0that has established',
' EXCEL\xa0as\xa0a premier physical therapy provider in Northern New Jersey.',
'Helping you navigate the road to recovery',
' ']
如何避免在换行符中进行拆分,以使文本
EXCEL物理疗法一直使我们的患者前进近30年。在此过程中,我们建立了一个 无与伦比的
声誉,结合了最高的质量 物理治疗
,并提供卓越的客户服务 提供真正的
“患者至上” 方法。正是这种哲学确立了
EXCEL作为北部新区的主要物理治疗提供者 泽西岛。
是否作为列表中的单个元素返回?
答案 0 :(得分:1)
您可以这样做:
soup.find_all("div", class_="ym-wbox")[1].find("p").text