我试图通过最小化受约束的平方误差总和来对 NFL 球队进行评分。误差定义为比赛中的实际得分差减去预测得分差。我的数据包括逐场比赛的得分。它们看起来像:
# Imports
import pandas as pd
import numpy as np
# Data
dat = {"Home_Team": ["KC Chiefs", "LA Chargers", "Baltimore Ravens"],
"Away_Team": ["Houston Texans", "Miami Dolphins", "KC Chiefs"],
"Home_Score": [34, 20, 20],
"Away_Score": [20, 23, 34],
"Margin": [14, -3, -34]
}
df = pd.DataFrame(dat)
df
Home_Team Away_Team Home_Score Away_Score Margin
0 KC Chiefs Houston Texans 34 20 14
1 LA Chargers Miami Dolphins 20 23 -3
2 Baltimore Ravens KC Chiefs 20 34 -34
保证金是保证金 = Home_Score - Away_Score。我的目标是为每个团队提出一个数字评级,使得所有团队的评级平均值为零。因此,如果酋长队的评分为 3.0,那么他们比一般球队高 3 分。
给定一组评分,我们以这种方式生成预测:主队的预测胜率是 Home_Edge + Home_Rating - Away_Rating,其中 Home_Edge 是主场优势(所有主队的常数),Home_Rating 是主场球队的评分,以及 Away_Rating 客队的评分。除了团队评分,我还想找到最佳的 Home_Edge 值。
正如我之前所说,预测中的误差是实际得分余量 - 预测余量,我想最小化这些误差的平方和。我正在尝试通过以下方式使用 scipy.optimize
执行此操作:
# Our objective function, where x is our array of parameters,
# x[0] is the home edge, x[1] the home rating, and x[2] the away rating
# Y is the true, observed margin
def obj_fun(x, Y):
y = x[0] + x[1] - x[2]
return np.sum((y - Y)**2)
# Define the constraint function. We have that the ratings average to 0
def con(x):
return np.mean(x[1])
# Constraint dictionary
cons = {'type': 'eq', 'fun': con}
# Minimize sum of squared errors
from scipy import optimize
# Initial guesses (numbers I randomly thought of in my head)
home_edge = 0.892
home_ratings = np.array([1.46, 9.67, -0.82])
away_ratings = np.array([-3.10, -6.57, 1.46])
x_init = [np.repeat(home_edge, 3), home_ratings, away_ratings]
# Minimize
results = optimize.minimize(fun = obj_fun, args = (df["Margin"]),
x0 = x_init, constraints = cons)
print(results.x)
[-2.9413615 0. 4.72534244 1.46 9.67 -0.82
-3.1 -6.57 1.46 ]
我希望我的输出有 6 个解决方案,而不是 9 个,所以我不太确定我哪里出错了。我们应该有一个解决主场优势的解决方案,另外还有五个解决方案(数据中的每支球队一个)。怎么了?谢谢!