我有一个可以抓取 www.oddsportal.com
的代码虽然我得到了一个输出,但它的输出非常奇怪,而且格式根本不正确!
我怎样才能获得所需的格式?
代码:
browser = webdriver.Chrome()
urls = {
"https://www.oddsportal.com/matches/soccer/20210515/"
}
class GameData:
def __init__(self):
self.date = []
self.time = []
self.game = []
self.score = []
self.home_odds = []
self.draw_odds = []
self.away_odds = []
self.country = []
self.league = []
def parse_data(url):
browser.get(url)
df = pd.read_html(browser.page_source, header=0)[0]
html = browser.page_source
soup = bs(html, "lxml")
cont = soup.find('div', {'id': 'wrap'})
content = cont.find('div', {'id': 'col-content'})
content = content.find('table', {'class': 'table-main'}, {'id': 'table-matches'})
main = content.find('th', {'class': 'first2 tl'})
if main is None:
return None
count = main.findAll('a')
country = count[0].text
league = count[1].text
game_data = GameData()
game_date = None
for row in df.itertuples():
if not isinstance(row[1], str):
continue
elif ':' not in row[1]:
game_date = row[1].split('-')[0]
continue
game_data.date.append(game_date)
game_data.time.append(row[1])
game_data.game.append(row[2])
game_data.score.append(row[3])
game_data.home_odds.append(row[4])
game_data.draw_odds.append(row[5])
game_data.away_odds.append(row[6])
game_data.country.append(country)
game_data.league.append(league)
return game_data
if __name__ == '__main__':
results = None
for url in urls:
try:
game_data = parse_data(url)
if game_data is None:
continue
result = pd.DataFrame(game_data.__dict__)
if results is None:
results = result
else:
results = results.append(result, ignore_index=True)
except ValueError:
game_data = parse_data(url)
if game_data is None:
continue
result = pd.DataFrame(game_data.__dict__)
if results is None:
results = result
except AttributeError:
game_data = parse_data(url)
if game_data is None:
continue
result = pd.DataFrame(game_data.__dict__)
if results is None:
results = result
else:
results = results.append(result, ignore_index=True)
results:
| | date | time | game | score | home_odds | draw_odds | away_odds | country | league |
|-----|---------------------------------------|--------|--------------------------------------------------|--------------------------------------------------|-------------|-------------|-------------|-----------|----------------------|
| 0 | | 00:00 | Tomayapo - Independiente Petroleros | Tomayapo - Independiente Petroleros | 1.82 | 3.70 | 3.59 | Bolivia | Division Profesional |
| 1 | Ecuador»Liga Pro | 00:00 | Dep. Cuenca - Mushuc Runa | Dep. Cuenca - Mushuc Runa | 2.34 | 3.18 | 2.93 | Bolivia | Division Profesional |
| 2 | USA»USL Championship | 00:00 | Sporting Kansas City 2 - Colorado Springs | Sporting Kansas City 2 - Colorado Springs | 2.18 | 3.48 | 2.95 | Bolivia | Division Profesional |
| 3 | Brazil»Campeonato Paulista | 00:30 | Sao Paulo - Ferroviaria | Sao Paulo - Ferroviaria | 1.51 | 3.93 | 6.15 | Bolivia | Division Profesional |
| 4 | Venezuela»Primera Division | 00:30 | Universidad Central - AC Lala FC | Universidad Central - AC Lala FC | 2.37 | 3.08 | 2.88 | Bolivia | Division Profesional |
所需/正确的格式:
| | date | time | game | score | home_odds | draw_odds | away_odds | country | league |
|-----|-------------|--------|--------------------------------------------------|---------|-------------|-------------|-------------|------------------------|-----------------------------|
| 0 | 15 May 2021 | 0:00 | Tomayapo - Independiente Petroleros | nan | 1.82 | 3.7 | 3.59 | Bolivia | Division Profesional |
| 1 | 15 May 2021 | 0:00 | Dep. Cuenca - Mushuc Runa | nan | 2.34 | 3.18 | 2.93 | Ecuador | Liga Pro |
| 2 | 15 May 2021 | 0:00 | Sporting Kansas City 2 - Colorado Springs | nan | 2.18 | 3.48 | 2.95 | USA | USL Championship |
| 3 | 15 May 2021 | 0:30 | Sao Paulo - Ferroviaria | nan | 1.51 | 3.93 | 6.15 | Brazil | Campeonato Paulista |
| 4 | 15 May 2021 | 0:30 | Universidad Central - AC Lala FC | nan | 2.37 | 3.08 | 2.88 | Venezuela | Primera Division |
| 5 | 15 May 2021 | 1:00 | Real Monarchs - LA Galaxy 2 | nan | 2.83 | 3.78 | 2.15 | USA | USL Championship |
| 6 | 15 May 2021 | 2:05 | Monterrey W - U.A.N.L.- Tigres W | nan | 2.86 | 3.51 | 2.2 | Mexico | Liga MX Women |
| 7 | 15 May 2021 | 3:15 | West Canberra Wanderers - Tigers FC | nan | 3.27 | 4.58 | 1.73 | Australia | NPL ACT |
我怎样才能如愿以偿?
因为 SO 需要我添加细节;
我知道 class:GameData 可以定义得更好,但是我不知道如何才能有效地做到这一点
谢谢