如何只为拥有它的页面获取下一页结果?

时间:2016-06-21 19:42:58

标签: python-3.x web-scraping beautifulsoup

因此,此代码可以获得所有比赛结果,团队与队伍以及比赛得分。例如像http://www.gosugamers.net/counterstrike/teams/7395-mousesports-cs/matches这样的团队。但是这段代码只获得了第一页的结果,我试图获得每个页面的所有结果。问题是有些团队没有下一页按钮,所以当我尝试实现该代码时程序崩溃了。我如何编写代码以获取下一页并继续获得结果,如果团队匹配链接没有下一页只是继续?

def all_match_outcomes():
    for match_outcomes in match_history_url():
        rest_server(True)
        page = requests.get(match_outcomes).content
        soup = BeautifulSoup(page, 'html.parser')

        team_name_element = soup.select_one('div.teamNameHolder')
        team_name = team_name_element.find('h1').text.replace('- Team Overview', '')

        for match_outcome in soup.select('table.simple.gamelist.profilelist tr'):
            opp1 = match_outcome.find('span', {'class': 'opp1'}).text
            opp2 = match_outcome.find('span', {'class': 'opp2'}).text

            opp1_score = match_outcome.find('span', {'class': 'hscore'}).text
            opp2_score = match_outcome.find('span', {'class': 'ascore'}).text

            if match_outcome(True):  # If teams have past matches
                print(team_name, '%s %s:%s %s' % (opp1, opp1_score, opp2_score, opp2))

1 个答案:

答案 0 :(得分:0)

在将游戏分数拉出桌面的for循环之后,你可以抓住分页链接。

使用此代码,您可以通过查找当前选定的页面来获取下一页。如果没有超出当前选定页面的页面(当前)将打印“未找到页面”。

paginate = soup.find('div', {'class':'paginator'})

page = paginate.find('a', {'class':'selected'})

next_page = page.find_next_sibling()
if next_page:
    print(next_page.get('href'))
else:
    print("no page found")

修改

回应评论;这就是我想用这段代码的方法。然后它将被添加,你可以继续循环。

def all_match_outcomes():
    for match_outcomes in match_history_url():
        rest_server(True)
        page = requests.get(match_outcomes).content
        soup = BeautifulSoup(page, 'html.parser')

        team_name_element = soup.select_one('div.teamNameHolder')
        team_name = team_name_element.find('h1').text.replace('- Team Overview', '')

        for match_outcome in soup.select('table.simple.gamelist.profilelist tr'):
            opp1 = match_outcome.find('span', {'class': 'opp1'}).text
            opp2 = match_outcome.find('span', {'class': 'opp2'}).text

            opp1_score = match_outcome.find('span', {'class': 'hscore'}).text
            opp2_score = match_outcome.find('span', {'class': 'ascore'}).text

            if match_outcome(True):  # If teams have past matches
                print(team_name, '%s %s:%s %s' % (opp1, opp1_score, opp2_score, opp2))
        # get the next page if there is one here
       page = paginate.find('a', {'class':'selected'})
       if page:
           next_page = page.find_next_sibling()
           if next_page:
               print(next_page.get('href'))
               # just append this to a list or add it to whatever you use to 
               # track the next url to crawl
               next_url = next_page.get('href')