美丽的汤和清单问题

时间:2017-05-10 19:13:38

标签: python beautifulsoup

编辑2:

所以问题是for loop问题从未发生,因为我编写它的方式,当它没有遇到Stats链接时,它只是完全跳过for loop。我重构了一下,如下所示。我不确定它是最有效的方法,但它有效。我可以发布一个新问题,以确定是否有更好,更清晰的方式来写这个。

        for span in team.find_all("span"):
            stats = span.find_all("a", href=True, text='Stats')
            if stats:
                for team_stats in stats:
                    team_stats_list.append(team_stats.get('href'))
            else:
                team_stats_list.append("NO STATS")
        print(team_stats_list)

下面的帖子......

我无法弄清楚为什么我的附加功能不能像我打算在这段代码中那样工作:

        for team_stats in team.find_all("a", href=True, text='Stats'):
            stats_available = team_stats.get('href')
            if stats_available:
                team_stats_list.append(stats_available)
            else:
                team_stats_list.append("NO STATS")
        print(team_stats_list)

基本上,我确保实际上href文本Stats被放入stats_available变量中。

如果有,我只需将stats_available变量附加到列表中。如果变量为空,我想将文本NO STATS添加到列表中。

代码正在废弃并获取href(如果可用),这样就不是问题了。问题是当没有href被称为Stats时,它没有附加NO STATS文本。该列表只是空的。

编辑1 - 到目前为止的整个功能:

    source = urllib.request.urlopen('http://www.espn.com/college-football/teams').read()
    soup = bs.BeautifulSoup(source, "lxml")
    page_source = soup.find_all("div", {"class": "mod-container mod-open-list mod-teams-list-medium mod-no-footer"})
    for conference in page_source:
        conference_title = conference.div.h4.text
        team_name_list = []
        team_clubhouse_list = []
        team_stats_list = []
        print(conference_title)

        for team in conference.find_all("ul", {"class": "medium-logos"}):
            for team_title in team.find_all('h5'):
                team_name_list.append(team_title.text)
            print(team_name_list)

        for team clubhouse in team.find_all("a", {"class": "bi"}):
                team_clubhouse_list.append(team_clubhouse.get('href'))
            print(team_clubhouse_list)

        for team_stats in team.find_all("a", href=True, text='Stats'):
                stats_available = team_stats.get('href')
                if stats_available:
                    team_stats_list.append(stats_available)
                else:
                    team_stats_list.append("NO STATS")
            print(team_stats_list)

2 个答案:

答案 0 :(得分:0)

import urllib.request
from bs4 import BeautifulSoup 

source = urllib.request.urlopen('http://www.espn.com/college-football/teams').read()
soup = BeautifulSoup(source, "lxml")
page_source = soup.find_all("div", {"class": "mod-container mod-open-list mod-teams-list-medium mod-no-footer"})
for conference in page_source:
    conference_title = conference.div.h4.text
    team_name_list = []
    team_clubhouse_list = []
    team_stats_list = []
    print(conference_title)
    for team in conference.find_all("ul", {"class": "medium-logos"}):
        for team_title in team.find_all('h5'):
            team_name_list.append(team_title.text)

        print(team_name_list)
        for team_clubhouse in team.find_all("a", {"class": "bi"}):
            team_clubhouse_list.append(team_clubhouse.get('href'))

        print(team_clubhouse_list)
        for team_stats in team.find_all("a", href=True, text='Stats'):
            stats_available = team_stats.get('href')
            team_stats_list.append(stats_available)


        if(len(team_stats_list)==0):
             team_stats_list.append("NO STATS")

        print(team_stats_list)

答案 1 :(得分:0)

所以问题是问题的for循环从未运行,因为我写它的方式。  当它没有遇到Stats链接时,它只是完全跳过for循环。我重构了一下,如下所示。我不确定它是最有效的方法,但它有效。我可以发布一个新问题,以确定是否有更好,更清晰的方式来写这个。

    for span in team.find_all("span"):
        stats = span.find_all("a", href=True, text='Stats')
        if stats:
            for team_stats in stats:
                team_stats_list.append(team_stats.get('href'))
        else:
            team_stats_list.append("NO STATS")
    print(team_stats_list)