抓篮结果并将比赛与每场比赛相关

时间:2020-09-16 17:40:16

标签: python selenium web-scraping

我想抓取this website的网球比赛结果

我想要的结果表具有以下列:tournament_name match_time team_1 team_2 team_1_score team2_2_score

这是一个例子

tournament_name         match_time    team_1   team_2   team_1_score   team2_2_score
Polska Liga Koszykówki  09-14 17:35   Asseco   Wikana   68             79
Friendly Competition    09-14 02:30   Costa    Mata     72             59

我创建了此代码,并成功获取了所有数据,但具有相同比赛名称的比赛的比赛名称除外

from selenium import webdriver

driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe")
driver.get(u)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")    
driver.implicitly_wait(60) # seconds
    
soup = BeautifulSoup(driver.page_source, 'html.parser')

for t in soup.select('.Leaguestitle'):
    match_time = [i.get_text(strip=True) for i in t.select('td:nth-child(1)')][0]
    tourn_soup= t.parent.parent.find_previous("table", id=lambda value: value and value.startswith("table"))
    tourn = tourn_soup.select_one('tbody > tr > td > span > a')
    row1 = t.find_next(class_='b1')
    team1 = row1.select_one('td:nth-child(2) a span').get_text(strip=True)
    team1_score = row1.select_one('td:nth-child(7)').get_text(strip=True)
    links = row1.select_one('td:nth-child(11) div > a')['href'] if row1.select_one('td:nth-child(11) div > a') else ''
    row2 = row1.find_next(class_='b1')
    team2 = row2.select_one('td:nth-child(1) a span').get_text(strip=True)
    team2_score = row2.select_one('td:nth-child(6)').get_text(strip=True)

代码返回比赛名称None,用于比赛之前没有直接包含比赛名称的表格(例如id="table_1"

1 个答案:

答案 0 :(得分:0)

在这里,您可以使用tournament_nameusing class name

soup = BeautifulSoup(driver.page_source, 'html.parser')

for t in soup.select('.Leaguestitle'):
    match_time = [i.get_text(strip=True) for i in t.select('td:nth-child(1)')][0]
    tournament_name=t.parent.parent.find_previous("table",class_='scoretitle').find_next('span',class_='l1').find_next('a').find_next('b').find_next('font').text
    tourn_soup = t.parent.parent.find_previous("table", id=lambda value: value and value.startswith("table"))
    tourn = tourn_soup.select_one('tbody > tr > td > span > a')
    row1 = t.find_next(class_='b1')
    team1 = row1.select_one('td:nth-child(2) a span').get_text(strip=True)
    team1_score = row1.select_one('td:nth-child(7)').get_text(strip=True)
    links = row1.select_one('td:nth-child(11) div > a')['href'] if row1.select_one('td:nth-child(11) div > a') else ''
    row2 = row1.find_next(class_='b1')
    team2 = row2.select_one('td:nth-child(1) a span').get_text(strip=True)
    team2_score = row2.select_one('td:nth-child(6)').get_text(strip=True)
    print(tournament_name,match_time,team1,team1_score,team2,team2_score)

输出

Polska Liga Koszykówki 09-14 16:35Finished Asseco Prokom Gdynia 68 Wikana Start SA Lublin 79
Friendly Competition 09-14 01:30Finished Costa Caribe 72 Matagalpa 59
Friendly Competition 09-14 15:00Finished Siauliai 61 U.Juventus 88
Friendly Competition 09-14 15:00Finished Rixiong Lixian Maccabi 49 Hapoel Eilat 68
Friendly Competition 09-14 15:30Finished cabi Electra Tel Aviv 77 Hapoel Holon 74
Friendly Competition 09-14 16:00Finished Daruss Afaka 70 Royal Hali Gaziantep 82
Friendly Competition 09-14 16:59Finished Hapoel Haifa 75 Hapoel Gilboa Galil Elyon 92
Friendly Competition 09-14 16:59Finished Ratiopharm Ulm 22 Monaco 18
Friendly Competition 09-14 17:30Finished Lietkabelis 91 BK Ogre 74
Italy Super Cup 09-14 16:00Finished Enel Brindisi 98 Virtus Roma 62
Italy Super Cup 09-14 19:00Finished Pallacanestro Trieste 2004 62 Benetton Treviso 80
Italy Super Cup 09-14 19:00Finished Umana Reyer Venezia 78 Pallacanestro Trento 2009 75
Italy Super Cup 09-14 20:00Finished Dinamo Sassari 78 Scavolini Spar Pesaro 81
Russian female cup 09-14 09:00Finished Dynamo K Woman's 116 Enisey Krasnoyarsk II Women's 37
Russian female cup 09-14 11:30Finished Kazanochka Kazan Women's 65 Vologda Chevakaa Woman's 39
Russian female cup 09-14 13:00Finished Neftyanik Avangard Women 67 Dynamo Moscow Woman's 77
Russian female cup 09-14 16:00Finished Spartak Moscow Region Woman's 91 Chernie Medved Politehnik (W) 45
Uruguay League 09-14 23:15Finished Tabare 73 Colon 79