我正在学习使用漂亮的汤来解析html中的div容器。但是由于某种原因,当我将div容器的类名传递给我美丽的汤时,什么也没发生。当我尝试解析div时,我没有任何内容。我可能做错了什么。这是我的HTML和解析
<div class="upcoming-date ng-hide no-league" ng-show="nav.upcoming" ng-class="{'no-league': !search.checkShowTitle(nav.sport,nav.todayHighlights,nav.upcoming,nav.orderBy,"FOOTBALL - HIGHLIGHTS")}">
<span class="weekday">Monday</span>
<timecomponent datetime="'2018-07-09T20:00:00+03:00'" show-date="true" show-time="false" class="date ng-isolate-scope"><span class="ng-binding">09/07/18</span></timecomponent>
<div class="clear"></div>
</div>
<div id="g1390856" class="match football FOOTBALL - HIGHLIGHTS" itemscope="" itemtype="https://schema.org/SportsEvent">
<div class="leaguename ng-hide" ng-show="search.checkShowTitle(nav.sport,nav.todayHighlights,nav.upcoming,nav.orderBy,"FOOTBALL - HIGHLIGHTS") && (1 || (nav.upcoming && 0))">
<span class="name">
<span class="flag-icon flag-icon-swe"></span>
Sweden - Allsvenskan
</span>
</div>
<ul class="meta">
<li class="date">
<timecomponent datetime="'2018-07-09T20:00:00+03:00'" show-date="true" show-time="false" class="ng-isolate-scope"><span class="ng-binding">09/07/18</span></timecomponent>
</li>
<li class="time">
<timecomponent datetime="'2018-07-09T20:00:00+03:00'" show-date="false" show-time="true" class="ng-isolate-scope"><span class="ng-binding">20:00</span></timecomponent>
</li>
<li class="game-id"><span class="gameid">GameID:</span> 2087</li>
</ul>
<ul class="teams">
<li>Hammarby</li>
<li>Ostersunds</li>
</ul>
<ul class="bet-selector">
<li class="pick01" id="b499795664">
<a data-id="499795664" ng-click="bets.pick($event, 499795664, 2087, 2.10)" class="betting-button pick-button " title="Hammarby">
<span class="team">Hammarby</span>
<span class="odd">2.10</span>
</a>
</li> <li class="pick0X" id="b499795666">
<a data-id="499795666" ng-click="bets.pick($event, 499795666, 2087, 3.56)" class="betting-button pick-button " title="Draw">
<span class="team">Draw</span>
<span class="odd">3.56</span>
</a>
</li> <li class="pick02" id="b499795668">
<a data-id="499795668" ng-click="bets.pick($event, 499795668, 2087, 3.40)" class="betting-button pick-button " title="Ostersunds">
<span class="team">Ostersunds</span>
<span class="odd">3.40</span>
</a>
</li> </ul>
<ul class="extra-picks">
<li>
<a class="betting-button " href="/games/1390856/markets?league=0&top=0&sid=2087&sportId=1">
<span class="short-desc">+13</span>
<span class="long-desc">View 13 more markets</span>
</a>
</li>
</ul>
<div class="game-stats">
<a href="https://s5.sir.sportradar.com/sportpesa/en/match/13414729" onclick="window.open(this.href, 'newwindow', 'width=1024, height=800, resizable=yes, scrollbars=yes'); return false;"><img class="img-responsive" src="/img/chart-icon.png?v2.2.25.2"></a>
</div>
<div class="clear"></div>
</div>
............................................... ..............
parser.py
import requests
import urllib2
from bs4 import BeautifulSoup as soup
udata = urllib2.urlopen('https://www.sportpesa.co.ke/?sportId=1')
htmlsource = udata.read()
ssoup = soup(htmlsource,'html.parser')
page_div = ssoup.findAll("div",{"class":"match football FOOTBALL - HIGHLIGHTS"})
print page_div
答案 0 :(得分:0)
“比赛足球-亮点”是动态课程,因此您只是空白列表。 这是我在python3中的代码
from bs4 import BeautifulSoup as bs4
import requests
request = requests.get('https://www.sportpesa.co.ke/?sportId=1')
soup = bs4(request.text, 'lxml')
print(soup)
打印汤后,您会发现该类不在您的源代码中……希望对您有帮助
答案 1 :(得分:0)
因此-如注释中所建议-从此站点获取数据的最佳(最快)方法是利用javascript使用的相同端点。
如果您使用的是Chrome,请弹出检查器工具,打开“网络”标签,然后加载页面。您会看到,该站点从URL获取数据,该URL与URL中实际显示的
非常相似。此端点为您提供所需的数据。要使用请求来获取它并获取div,将如下所示:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://sportpesa.co.ke/sportgames?sportId=1")
soup = BeautifulSoup(r.text, "html.parser")
page_divs = soup.select('div.match.football.FOOTBALL.-.HIGHLIGHTS')
print(len(page_divs))
将显示30-这是div的数量。顺便说一句,我在这里使用bs4方法选择,这是bs4推荐的处理方式,当您-像这里一样-具有多个类别(“比赛”,“足球”,“足球” ,“-”和“ HIGHLIGHTS”)。