我有这个代码来抓取oddsportal页面:
https://www.oddsportal.com/soccer/england/premier-league/
browser = webdriver.Chrome()
browser.get("https://www.oddsportal.com/soccer/england/premier-league/")
df= pd.read_html(browser.page_source, header=0)[0]
timeList = []
dateList = []
gameList = []
home_odds = []
draw_odds = []
away_odds = []
for row in df.itertuples():
if not isinstance(row[1], str):
continue
elif ':' not in row[1]:
date = row[1].split('-')[0]
continue
time = timeList.append(row[1])
dateList.append(date)
gameList.append(row[2])
home_odds.append(row[4])
draw_odds.append(row[5])
away_odds.append(row[6])
result = pd.DataFrame({'date':dateList,
'time':time,
'game':gameList,
'Home':home_odds,
'Draw':draw_odds,
'Away':away_odds})
我得到的输出为:
date time game Home Draw Away
-- ------------- ------ ----------------------------- ------ ------ ------
0 Today, 08 Mar Chelsea - Everton 1.62 3.93 6.07
1 Today, 08 Mar West Ham - Leeds 2.25 3.61 3.18
2 10 Mar 2021 Manchester City - Southampton 1.22 6.94 13.75
3 12 Mar 2021 Newcastle - Aston Villa 3.8 3.59 2
4 13 Mar 2021 Leeds - Chelsea 4.45 3.97 1.77
5 13 Mar 2021 Crystal Palace - West Brom 2.1 3.34 3.77
6 13 Mar 2021 Everton - Burnley 1.84 3.61 4.54
7 13 Mar 2021 Fulham - Manchester City 10.05 5.16 1.34
8 14 Mar 2021 Southampton - Brighton 2.8 3.11 2.77
9 14 Mar 2021 Leicester - Sheffield Utd 1.5 4.34 7.06
10 14 Mar 2021 Arsenal - Tottenham 2.48 3.47 2.87
11 14 Mar 2021 Manchester Utd - West Ham 1.86 3.62 4.44
12 15 Mar 2021 Wolves - Liverpool 4.65 3.66 1.8
13 19 Mar 2021 Fulham - Leeds 2.55 3.53 2.72
14 20 Mar 2021 Brighton - Newcastle 1.76 3.39 5.58
15 21 Mar 2021 West Ham - Arsenal 2.86 3.51 2.44
16 21 Mar 2021 Aston Villa - Tottenham 3.24 3.4 2.27
我没有从 time
中获得任何价值
有人可以帮助我了解我是否遗漏了什么吗?我是否正确定义了 time
?
答案 0 :(得分:1)
timeList.append(row[1])
不返回任何内容,因此 time
始终为 None。我怀疑你想要:
time = row[1]
timeList.append(time)
答案 1 :(得分:1)
我认为这是一个简单的疏忽错误。您已将 time
函数的返回值分配给 list.append()
变量,即 None
。
所以,而不是:
time = timeList.append(row[1])
只需调用该函数,就像您对以下函数所做的那样:
timeList.append(row[1])