抓取网站后如何获得正确的输出?

时间:2021-05-14 04:06:52

标签: python pandas dataframe selenium-chromedriver

我有一个可以抓取 www.oddsportal.com

的代码

虽然我得到了一个输出,但它的输出非常奇怪,而且格式根本不正确!

我怎样才能获得所需的格式?

代码:

browser = webdriver.Chrome()

urls = {
    "https://www.oddsportal.com/matches/soccer/20210515/"
}
class GameData:

    def __init__(self):
        self.date = []
        self.time = []
        self.game = []
        self.score = []
        self.home_odds = []
        self.draw_odds = []
        self.away_odds = []
        self.country = []
        self.league = []


def parse_data(url):
    browser.get(url)
    df = pd.read_html(browser.page_source, header=0)[0]
    html = browser.page_source
    soup = bs(html, "lxml")
    cont = soup.find('div', {'id': 'wrap'})
    content = cont.find('div', {'id': 'col-content'})
    content = content.find('table', {'class': 'table-main'}, {'id': 'table-matches'})
    main = content.find('th', {'class': 'first2 tl'})
    if main is None:
        return None
    count = main.findAll('a')
    country = count[0].text
    league = count[1].text
    game_data = GameData()
    game_date = None
    for row in df.itertuples():
        if not isinstance(row[1], str):
            continue
        elif ':' not in row[1]:
            game_date = row[1].split('-')[0]
            continue
        game_data.date.append(game_date)
        game_data.time.append(row[1])
        game_data.game.append(row[2])
        game_data.score.append(row[3])
        game_data.home_odds.append(row[4])
        game_data.draw_odds.append(row[5])
        game_data.away_odds.append(row[6])
        game_data.country.append(country)
        game_data.league.append(league)
    return game_data




if __name__ == '__main__':

    results = None

    for url in urls:
        try:
            game_data = parse_data(url)
            if game_data is None:
                continue
            result = pd.DataFrame(game_data.__dict__)
            if results is None:
                results = result
            else:
                results = results.append(result, ignore_index=True)
        except ValueError:
            game_data = parse_data(url)
            if game_data is None:
                continue
            result = pd.DataFrame(game_data.__dict__)
            if results is None:
                results = result
        except AttributeError:
            game_data = parse_data(url)
            if game_data is None:
                continue
            result = pd.DataFrame(game_data.__dict__)
            if results is None:
                results = result
            else:
                results = results.append(result, ignore_index=True)

results:

|     | date                                  | time   | game                                             | score                                            | home_odds   | draw_odds   | away_odds   | country   | league               |
|-----|---------------------------------------|--------|--------------------------------------------------|--------------------------------------------------|-------------|-------------|-------------|-----------|----------------------|
|   0 |                                       | 00:00  | Tomayapo - Independiente Petroleros              | Tomayapo - Independiente Petroleros              | 1.82        | 3.70        | 3.59        | Bolivia   | Division Profesional |
|   1 | Ecuador»Liga Pro                      | 00:00  | Dep. Cuenca - Mushuc Runa                        | Dep. Cuenca - Mushuc Runa                        | 2.34        | 3.18        | 2.93        | Bolivia   | Division Profesional |
|   2 | USA»USL Championship                  | 00:00  | Sporting Kansas City 2 - Colorado Springs        | Sporting Kansas City 2 - Colorado Springs        | 2.18        | 3.48        | 2.95        | Bolivia   | Division Profesional |
|   3 | Brazil»Campeonato Paulista            | 00:30  | Sao Paulo - Ferroviaria                          | Sao Paulo - Ferroviaria                          | 1.51        | 3.93        | 6.15        | Bolivia   | Division Profesional |
|   4 | Venezuela»Primera Division            | 00:30  | Universidad Central - AC Lala FC                 | Universidad Central - AC Lala FC                 | 2.37        | 3.08        | 2.88        | Bolivia   | Division Profesional |

所需/正确的格式:

|     | date        | time   | game                                             |   score | home_odds   | draw_odds   | away_odds   | country                | league                      |
|-----|-------------|--------|--------------------------------------------------|---------|-------------|-------------|-------------|------------------------|-----------------------------|
|   0 | 15 May 2021 | 0:00   | Tomayapo - Independiente Petroleros              |     nan | 1.82        | 3.7         | 3.59        | Bolivia                | Division Profesional        |
|   1 | 15 May 2021 | 0:00   | Dep. Cuenca - Mushuc Runa                        |     nan | 2.34        | 3.18        | 2.93        | Ecuador                | Liga Pro                    |
|   2 | 15 May 2021 | 0:00   | Sporting Kansas City 2 - Colorado Springs        |     nan | 2.18        | 3.48        | 2.95        | USA                    | USL Championship            |
|   3 | 15 May 2021 | 0:30   | Sao Paulo - Ferroviaria                          |     nan | 1.51        | 3.93        | 6.15        | Brazil                 | Campeonato Paulista         |
|   4 | 15 May 2021 | 0:30   | Universidad Central - AC Lala FC                 |     nan | 2.37        | 3.08        | 2.88        | Venezuela              | Primera Division            |
|   5 | 15 May 2021 | 1:00   | Real Monarchs - LA Galaxy 2                      |     nan | 2.83        | 3.78        | 2.15        | USA                    | USL Championship            |
|   6 | 15 May 2021 | 2:05   | Monterrey W - U.A.N.L.- Tigres W                 |     nan | 2.86        | 3.51        | 2.2         | Mexico                 | Liga MX Women               |
|   7 | 15 May 2021 | 3:15   | West Canberra Wanderers - Tigers FC              |     nan | 3.27        | 4.58        | 1.73        | Australia              | NPL ACT                     |

我怎样才能如愿以偿?

因为 SO 需要我添加细节;

我知道 class:GameData 可以定义得更好,但是我不知道如何才能有效地做到这一点

谢谢

0 个答案:

没有答案