在Python中优化团队评级

时间:2018-01-30 14:47:23

标签: python python-3.x pandas optimization

一般而言,目标是通过优化导致收敛的参数来最大化负对数似然(或最小化正对数似然)。在上下文中,这些参数是攻击评级,防御评级,标准偏差和通用主页优势。前三个参数将是向量(比赛中的团队数量),并且它们是团队特定的,而Home Advantage将只是一个标量。

import numpy as np
import pandas as pd
import scipy.optimize


# Reads the game data
game = pd.read_csv('Games.csv') 
numGames = len(game) # Number of Games
homeadv = 1.1 # Home Advantage 

上面读取的原始pandas DataFrame的前两行如下所示:

   Game Home ID Away ID Home Points Away Points
    1     1        2        62         59
    2     3        4        81         82

整理团队的ID和初始参数猜测

id_list = sorted(pd.unique(pd.concat([game['HomeID'], game['AwayID']], axis=0)))

# Attack Parameters, Defence Parameters, Standard Deviation Parameters, and Home Advantage set to an arbitrary value
attackratings = [5 for id in id_list]
defenceratings = [5 for id in id_list]
stdevratings = [2 for id in id_list]
homeadv = 1.1 # Home Advantage for the Team playing at home

# Put into a tuple for the scipy.optimize.minimize
init_params = tuple(attackratings + defenceratings + stdevratings + [homeadv])

每个参数的列表 - _h表示Home和_a表示离开

attack_h = []
defence_a = []
st_dev_h = []
st_dev_a = []
attack_a = []
defence_h = []

for i in range(0,len(game)):
    x = attackratings[id_list.index(game.HomeID[i])]
    attack_h.append(x)
    x = defenceratings[id_list.index(game.AwayID[i])]
    defence_a.append(x)
    x = stdevratings[id_list.index(game.HomeID[i])]
    st_dev_h.append(x)
    x = stdevratings[id_list.index(game.AwayID[i])]
    st_dev_a.append(x)
    # Home Def and Away Att
    x = attackratings[id_list.index(game.AwayID[i])]
    attack_a.append(x)
    x = defenceratings[id_list.index(game.HomeID[i])]
    defence_h.append(x)

game['attack_h'] = attack_h
game['defence_a'] = defence_a
game['attack_a'] = attack_a
game['defence_h'] = defence_h
game['st_dev_h'] = st_dev_h
game['st_dev_a'] = st_dev_a

根据参数计算每个团队获得这些点的概率:

game['exp_home'] = scipy.stats.norm.pdf(game.HomePts,game.attack_h*game.defence_a*homeadv,game.st_dev_h*game.st_dev_a)
game['exp_away'] = scipy.stats.norm.pdf(game.AwayPts,game.attack_a*game.defence_h,game.st_dev_h*game.st_dev_a)

下一步是查找每个匹配的对数可能性,这只是exp_homeexp_away

的乘积
game['loglik'] = np.log(game['exp_home']*game['exp_away'])

所以game['loglik']的总和是需要最小化的东西,但我不知道该如何去做。

到目前为止,我的努力已经失败了,但下面的代码基本上就是我追求的损失功能:

def logsum(params,game,id_list):
    lt = -np.sum(game.xy)
    return lt

W = scipy.optimize.minimize(logsum, x0=init_params, args=(game, id_list))

我对Python很新,但任何帮助都会非常感激!并且,如果上述解释不清楚,请解决任何问题。

0 个答案:

没有答案