一般而言,目标是通过优化导致收敛的参数来最大化负对数似然(或最小化正对数似然)。在上下文中,这些参数是攻击评级,防御评级,标准偏差和通用主页优势。前三个参数将是向量(比赛中的团队数量),并且它们是团队特定的,而Home Advantage将只是一个标量。
import numpy as np
import pandas as pd
import scipy.optimize
# Reads the game data
game = pd.read_csv('Games.csv')
numGames = len(game) # Number of Games
homeadv = 1.1 # Home Advantage
上面读取的原始pandas DataFrame的前两行如下所示:
Game Home ID Away ID Home Points Away Points
1 1 2 62 59
2 3 4 81 82
整理团队的ID和初始参数猜测
id_list = sorted(pd.unique(pd.concat([game['HomeID'], game['AwayID']], axis=0)))
# Attack Parameters, Defence Parameters, Standard Deviation Parameters, and Home Advantage set to an arbitrary value
attackratings = [5 for id in id_list]
defenceratings = [5 for id in id_list]
stdevratings = [2 for id in id_list]
homeadv = 1.1 # Home Advantage for the Team playing at home
# Put into a tuple for the scipy.optimize.minimize
init_params = tuple(attackratings + defenceratings + stdevratings + [homeadv])
每个参数的列表 - _h表示Home和_a表示离开
attack_h = []
defence_a = []
st_dev_h = []
st_dev_a = []
attack_a = []
defence_h = []
for i in range(0,len(game)):
x = attackratings[id_list.index(game.HomeID[i])]
attack_h.append(x)
x = defenceratings[id_list.index(game.AwayID[i])]
defence_a.append(x)
x = stdevratings[id_list.index(game.HomeID[i])]
st_dev_h.append(x)
x = stdevratings[id_list.index(game.AwayID[i])]
st_dev_a.append(x)
# Home Def and Away Att
x = attackratings[id_list.index(game.AwayID[i])]
attack_a.append(x)
x = defenceratings[id_list.index(game.HomeID[i])]
defence_h.append(x)
game['attack_h'] = attack_h
game['defence_a'] = defence_a
game['attack_a'] = attack_a
game['defence_h'] = defence_h
game['st_dev_h'] = st_dev_h
game['st_dev_a'] = st_dev_a
根据参数计算每个团队获得这些点的概率:
game['exp_home'] = scipy.stats.norm.pdf(game.HomePts,game.attack_h*game.defence_a*homeadv,game.st_dev_h*game.st_dev_a)
game['exp_away'] = scipy.stats.norm.pdf(game.AwayPts,game.attack_a*game.defence_h,game.st_dev_h*game.st_dev_a)
下一步是查找每个匹配的对数可能性,这只是exp_home
和exp_away
game['loglik'] = np.log(game['exp_home']*game['exp_away'])
所以game['loglik']
的总和是需要最小化的东西,但我不知道该如何去做。
到目前为止,我的努力已经失败了,但下面的代码基本上就是我追求的损失功能:
def logsum(params,game,id_list):
lt = -np.sum(game.xy)
return lt
W = scipy.optimize.minimize(logsum, x0=init_params, args=(game, id_list))
我对Python很新,但任何帮助都会非常感激!并且,如果上述解释不清楚,请解决任何问题。