使用BeautifulSoup4进行网页爬取

时间:2020-07-17 09:30:59

标签: python html python-3.x web-scraping beautifulsoup

我在下面提供了一些html数据,我想从网页中提取所有时间,然后将所有数据存储在变量列表中。我该怎么办..请帮忙..

<div class=panchang-box-secondary-header>
<div class="list-wrapper pl-2">
<div class="list-style-thumbnail list-layout-horizontal">
<div class="list-item-outer py-2">
<div class="d-flex w-100 align-items-center">
<span class="icon-sprite icon-sprite-sunrise"></span>
<div class=flex-grow-1>
<span class="d-block t-sm">सूर्योदय</span>
<span class="d-block b">5:31 AM</span>
</div>
</div>
</div>
<div class="list-item-outer py-2">
<div class="d-flex w-100 align-items-center">
<span class="icon-sprite icon-sprite-sunset"></span>
<div class=flex-grow-1>
<span class="d-block t-sm">सूर्यास्त</span>
<span class="d-block b">7:24 PM</span>
</div>
</div>
</div>
<div class="list-item-outer py-2">
<div class="d-flex w-100 align-items-center">
<span class="icon-sprite icon-sprite-moonrise"></span>
<div class=flex-grow-1>
<span class="d-block t-sm">चन्द्रोदय</span>
<span class="d-block b">10:05 PM</span>
</div>
</div>
</div>
<div class="list-item-outer py-2">
<div class="d-flex w-100 align-items-center">
<span class="icon-sprite icon-sprite-moonset"></span>
<div class=flex-grow-1>
<span class="d-block t-sm">चन्द्रास्त</span>
<span class="d-block b">9:12 AM</span>
</div>
</div>
</div>

3 个答案:

答案 0 :(得分:1)

只需提取“ d块b”,然后将其推入所需位置即可。

答案 1 :(得分:1)

time = soup.find_all(class_ = "d-block b").text 

这将创建一个列表,该列表始终获取网页源中的所有时间并将其存储在变量 time

答案 2 :(得分:1)

尝试使用此:

from bs4 import BeautifulSoup
a = '''<div class=panchang-box-secondary-header>
<div class="list-wrapper pl-2">
<div class="list-style-thumbnail list-layout-horizontal">
<div class="list-item-outer py-2">
<div class="d-flex w-100 align-items-center">
<span class="icon-sprite icon-sprite-sunrise"></span>
<div class=flex-grow-1>
<span class="d-block t-sm">सूर्योदय</span>
<span class="d-block b">5:31 AM</span>
</div>
</div>
</div>
<div class="list-item-outer py-2">
<div class="d-flex w-100 align-items-center">
<span class="icon-sprite icon-sprite-sunset"></span>
<div class=flex-grow-1>
<span class="d-block t-sm">सूर्यास्त</span>
<span class="d-block b">7:24 PM</span>
</div>
</div>
</div>
<div class="list-item-outer py-2">
<div class="d-flex w-100 align-items-center">
<span class="icon-sprite icon-sprite-moonrise"></span>
<div class=flex-grow-1>
<span class="d-block t-sm">चन्द्रोदय</span>
<span class="d-block b">10:05 PM</span>
</div>
</div>
</div>
<div class="list-item-outer py-2">
<div class="d-flex w-100 align-items-center">
<span class="icon-sprite icon-sprite-moonset"></span>
<div class=flex-grow-1>
<span class="d-block t-sm">चन्द्रास्त</span>
<span class="d-block b">9:12 AM</span>
</div>
</div>
</div>'''
soup = BeautifulSoup(a,'html.parser')
time = soup.select('.d-block.b')
times = [times.text for times in time]
print(times)

输出:

['5:31 AM', '7:24 PM', '10:05 PM', '9:12 AM']