这是我的情况,我正在使用此代码抓取这个html,但我没有找到如何将第一部分与第二部分分开。我只想刮掉第一部分,然后分开第二部分。使用beautifulsoup4
html
<div id="first_content" class="header">
<div class="list">
<div class="row">
<a name="03049302"></a>
<div class="col-xs-12 drop-panel-content">
<p>
first section first text. </p>
</div>
<div class="drop-panel drop-panel-one-row-height">
<p class="text-center">Edit</p>
<p class="text-center">Share</p>
</div>
</div>
<div class="row">
<a name="03049303"></a>
<div class="col-xs-12 drop-panel-content">
<p>
first section second text. </p>
</div>
<div class="drop-panel drop-panel-one-row-height">
<p class="text-center">Edit</p>
<p class="text-center">Share</p>
<section id="second_content">
<a name="aname" class="btn-collapse collapsed" data-toggle="collapse" data-target="#aname">
<h3>A Name</h3>
</a>
<div class="collapse flush-width flush-down" id="aname">
<div class="list">
<div class="row">
<a name="03049304"></a>
<div class="col-xs-12 drop-panel-content">
<p>
second section first text. </p>
</div>
<div class="drop-panel drop-panel-one-row-height">
<p class="text-center">Edit</p>
<p class="text-center">Share</p>
</div>
这是代码:
try:
all_data = myData(link).findAll("div", {"class": "col-xs-12 drop-panel-content"})
for data in all_data:
print data.text
except AttributeError as e:
return None
**除了不在同一输出中
当前输出
first section first text.
first section second text.
second section first text.
通缉输出
first section first text.
first section second text.
并希望输出,除了另一个功能
second section first text.
答案 0 :(得分:2)
一种选择是使用section
标记来区分这些部分。第二部分位于section
标记内,但第一部分不是。
all_data = soup.find_all("div", {"class": "col-xs-12 drop-panel-content"})
for data in all_data:
if data.find_parent("section") is None:
print data.get_text(strip=True)
或者,如果严格有2个第一节文本,只需切片节文本列表:
all_data = soup.find_all("div", {"class": "col-xs-12 drop-panel-content"})[:2]
for data in all_data:
print data.get_text(strip=True)