用约束最小化最小二乘法

时间:2021-01-11 17:28:42

标签: python scipy mathematical-optimization scipy-optimize-minimize

我试图通过最小化受约束的平方误差总和来对 NFL 球队进行评分。误差定义为比赛中的实际得分差减去预测得分差。我的数据包括逐场比赛的得分。它们看起来像:

# Imports
import pandas as pd
import numpy as np

# Data
dat = {"Home_Team": ["KC Chiefs", "LA Chargers", "Baltimore Ravens"],
       "Away_Team": ["Houston Texans", "Miami Dolphins", "KC Chiefs"],
       "Home_Score": [34, 20, 20],
       "Away_Score": [20, 23, 34],
       "Margin": [14, -3, -34]
      }
df = pd.DataFrame(dat)
df

    Home_Team        Away_Team      Home_Score  Away_Score  Margin
0   KC Chiefs        Houston Texans 34          20          14
1   LA Chargers      Miami Dolphins 20          23          -3
2   Baltimore Ravens KC Chiefs      20          34          -34 

保证金是保证金 = Home_Score - Away_Score。我的目标是为每个团队提出一个数字评级,使得所有团队的评级平均值为零。因此,如果酋长队的评分为 3.0,那么他们比一般球队高 3 分。

给定一组评分,我们以这种方式生成预测:主队的预测胜率是 Home_Edge + Home_Rating - Away_Rating,其中 Home_Edge 是主场优势(所有主队的常数),Home_Rating 是主场球队的评分,以及 Away_Rating 客队的评分。除了团队评分,我还想找到最佳的 Home_Edge 值。

正如我之前所说,预测中的误差是实际得分余量 - 预测余量,我想最小化这些误差的平方和。我正在尝试通过以下方式使用 scipy.optimize 执行此操作:

# Our objective function, where x is our array of parameters, 
# x[0] is the home edge, x[1] the home rating, and x[2] the away rating
# Y is the true, observed margin
def obj_fun(x, Y):
    y = x[0] + x[1] - x[2]
    return np.sum((y - Y)**2)

# Define the constraint function. We have that the ratings average to 0
def con(x):
    return np.mean(x[1])

# Constraint dictionary
cons = {'type': 'eq', 'fun': con}

# Minimize sum of squared errors
from scipy import optimize

# Initial guesses (numbers I randomly thought of in my head)
home_edge = 0.892
home_ratings = np.array([1.46, 9.67, -0.82])
away_ratings = np.array([-3.10, -6.57, 1.46])
x_init = [np.repeat(home_edge, 3), home_ratings, away_ratings]

# Minimize
results = optimize.minimize(fun = obj_fun, args = (df["Margin"]), 
x0 = x_init, constraints = cons)

print(results.x)
[-2.9413615   0.          4.72534244  1.46        9.67       -0.82
 -3.1        -6.57        1.46      ]

我希望我的输出有 6 个解决方案,而不是 9 个,所以我不太确定我哪里出错了。我们应该有一个解决主场优势的解决方案,另外还有五个解决方案(数据中的每支球队一个)。怎么了?谢谢!

0 个答案:

没有答案