使用BeautifulSoup解析一些HTML

时间:2016-03-04 02:23:12

标签: python html beautifulsoup

我有html的以下部分,它是显示足球比赛结果的页面的一部分。

<div class = "schedules-list-matchup"></div>
<!-- <un inportant stuff  -->
<div class=list-matchup-row-team>
    <span class="team-name away lost">team1</span>
    <span class="team-logo away team-name">...</span>
    <span class="team-score away lost">2</span>
    <span class="team-score home">3</span>
    <span class="team-logo home team-name">...</span>
    <span class="team-name home">team2</span>
</div>
<div class=list-matchup-row-team>
    <span class="team-name away lost">team3</span>
    <span class="team-logo away team-name">...</span>
    <span class="team-score away lost">2</span>
    <span class="team-score home">3</span>
    <span class="team-logo home team-name">...</span>
    <span class="team-name home">team4</span>
</div>
<!-- <ramainder of code> -->

我正在尝试阅读它并创建类的对象:

class Game:
def __init__(self, homeTeam, homeTeamScore, awayTeam, awayTeamScore):
    self.homeTeam = homeTeam
    self.homeTeamScore = homeTeamScore
    self.awayTeam = awayTeam
    self.awayTeamScore = awayTeamScore

我认为我在做的是迭代每个<div class= "list-matchup-row-team>

我的代码:

html = urlopen(baseUrl + '1')
bsObj = BeautifulSoup(html, 'lxml')
table = bsObj.find("ul",{"class":"schedules-table"})

for game in table.findAll("li", {"class":"schedules-list-matchup"}):
    for g in game.findAll("div", {"class":"list-matchup-row-team"}):
        for teams in g.findAll("span", {"class" : "home"}):
            print(teams.find("span", {"class" : "team-name"}))
            print(teams.find("span", {"class" : "team-score"}))

    print('==========================')

返回一堆空对象。 我怎么能遍历<div class= "list-matchup-row-team>标签内的每个span元素并检查该类是否同时包含'team-name'和'team-score'?每个家庭和远方?

1 个答案:

答案 0 :(得分:0)

我认为你可以直接接近团队名称类。

试试这个。

table.findAll("span", {"class" : "team-name"})

然后带走回家。