使用唯一ID从表中搜索数据

时间:2018-01-18 20:25:49

标签: python html beautifulsoup web-crawler

我正试图从this网站上搜集一下。我的目标是收集任何团队的最新10个结果(赢/输/抽奖),我只是以这个特定团队为例。单个行的来源是:

<tr class="odd      match no-date-repetition" data-timestamp="1515864600" id="page_team_1_block_team_matches_3_match-2463021" data-competition="8">
        <td class="day no-repetition">Sat</td>


        <td class="full-date" nowrap="nowrap">13/01/18</td>
        <td class="competition"><a href="/national/england/premier-league/20172018/regular-season/r41547/" title="Premier League">PRL</a></td>

          <td class="team team-a ">
              <a href="/teams/england/tottenham-hotspur-football-club/675/" title="Tottenham Hotspur">
                Tottenham Hotspur
              </a>
          </td>

        <td class="score-time score">
          <a href="/matches/2018/01/13/england/premier-league/tottenham-hotspur-football-club/everton-football-club/2463021/" class="result-win">

            4 - 0

          </a>
        </td>
          <td class="team team-b ">
            <a href="/teams/england/everton-football-club/674/" title="Everton">
              Everton
            </a>
          </td>
        <td class="events-button button first-occur">
            <a href="/matches/2018/01/13/england/premier-league/tottenham-hotspur-football-club/everton-football-club/2463021/#events" title="View events" class="events-button-button ">View events</a>
        </td>

          <td class="info-button button">

              <a href="/matches/2018/01/13/england/premier-league/tottenham-hotspur-football-club/everton-football-club/2463021/" title="More info">More info</a>



          </td>




      </tr>

您可以在<td class="score-time score"中看到,结果已存储。 我对Python和网络爬行的了解非常有限,所以我目前的代码是:

res2 = requests.get(soccerwayURL)
soup2 = bs4.BeautifulSoup(res2.text, 'html.parser')
elems2 = soup2.select('#page_team_1_block_team_matches_3_match-2463021 > td.score-time.score')
print(elems2[0].text.strip())

这打印出'4-0'。这很好,但是当我尝试访问另一行时会出现问题。 7位数字(上例中的2463021)对于该行是唯一的。这意味着如果我想从不同的行获得分数,我将不得不找到唯一的7位数字并将其放在CSS选择器'#page_team_1_block_team_matches_3_match-******* > td.score-time.score'中,其中星号是唯一的数字。

我参加的在线课程仅展示了如何通过CSS选择器引用内容,因此我不确定如何在不手动为每行选择CSS选择器的情况下检索分数。

<td class="score-time score">类中,还有另一个类读取class="result-win">。理想情况下,我希望能够提取"result-win",因为我不是在寻找比赛的得分,我只是在寻找胜负,失败或平局的结果。

我希望这篇文章很清楚。我的知识有限,所以如果我的词汇与某些技术术语不完全相符,我会道歉。

我的客观声明是:“从Soccerway网站上的任何一个团队中检索最近的10个结果(赢,输,抽奖)。”

1 个答案:

答案 0 :(得分:0)

from bs4 import BeautifulSoup
import requests
import urllib3

#Had some security issues. Had to disable it. Be careful!
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

#Need to disable verifying ssl.Be careful!
r = requests.get('https://us.soccerway.com/teams/england/tottenham-hotspur-football-club/675/matches/',verify=False)
soup = BeautifulSoup(r.text, 'lxml')

matches = soup.find('table',{'class':'matches'}).find('tbody')

i = 0
for row in matches.find_all('tr'):
    #For first ten result
    if i == 10:
        break
    else:
        i +=1

    data = row.find_all('td')
    home_team = data[3].text.strip()
    match_result = data[4].text.strip()
    match_result_class = data[4].find('a').attrs['class'][0]
    away_team = data[5].text.strip()

    output = str.format('Home team : {0}, Away team : {1}, Match Result Class :{2}',home_team,away_team,match_result_class)
    print(output)

输出

Home team : Newcastle United, Away team : Tottenham Hotspur, Match Result Class :result-win
Home team : Tottenham Hotspur, Away team : Chelsea, Match Result Class :result-loss
Home team : Tottenham Hotspur, Away team : Burnley, Match Result Class :result-draw
Home team : Everton, Away team : Tottenham Hotspur, Match Result Class :result-win
Home team : Tottenham Hotspur, Away team : Borussia Dortmund, Match Result Class :result-win
Home team : Tottenham Hotspur, Away team : Swansea City, Match Result Class :result-draw
Home team : Tottenham Hotspur, Away team : Barnsley, Match Result Class :result-win
Home team : West Ham United, Away team : Tottenham Hotspur, Match Result Class :result-win
Home team : APOEL, Away team : Tottenham Hotspur, Match Result Class :result-win
Home team : Huddersfield Town, Away team : Tottenham Hotspur, Match Result Class :result-win