我试图通过测量与各个比赛时段的平方差的总和来优化足球(足球)比赛中的预期目标。假设每场比赛被分成k个时段,并且任意球队或无目标得分都有不变的目标概率。
**Sample SSD for individual match_i with Final score [0-0]**
xG is unique in each match.
Team1 and Team2 has the following xG multiplied by arbitrary multiplier M.
Team1 = xG_1*M
Team2 = xG_2*M
prob_1 = [1-(xG_1 + xG_2)/k, xG_1/k, xG_2/k].
其中Prob_1
是Draw
,Team1 Goal
或Team2 Goal
的概率,每个时段(k)
每match_i
个sum(prob_1) = 1
SSD
{1}}。
为match_i
衡量x1 = [1,0,0] #; prob. of No goal scored per timeslot.
x2 = [0,1,0] #; prob. of Home Team scoring per timeslot.
x3 = [0,0,1] #; prob. of Away Team scoring per timeslot.
y = np.array([1-(xG_1 + xG_2)/k, xG_1/k, xG_2/k])
# Using xG_Team1 and xG_Team2 from table below.
total_timeslot = 180
Home_Goal = [] # No Goal scored
Away_Goal = [] # Np Goal scored
def sum_squared_diff(x1, x2, x3, y):
ssd=[]
for k in range(total_timeslot):
if k in Home_Goal:
ssd.append( sum((x2 - y)**2))
elif k in Away_Goal:
ssd.append(sum((x3 - y)**2))
else:
ssd.append(sum((x1 - y)**2))
return ssd
SSD_Result = sum_squared_diff(x1, x2, x3, y)
sum(SSD_Result)
。
xGs
例如,使用下表index 0
中M = 1
的{{1}}和First, for k = 187 timeslot, xG per timeslot becomes 1.4405394105672238/187, 1.3800950382265837/187
and are constant throughout the match.
y_0 = np.array([1-(0.007703419308 + 0.007380187370)/187, 0.007703419308/187, 0.007380187370/187])
Using y_0 in the function above,
SSD_Result for xG at index 0 is 1.8252675137316426e-06.
SSD
正如xG figure
那样看起来很有希望,但随后比赛结束时没有进球,两支球队几乎相同xG index 1, xG index 2....xG index 10000.
....
现在我想对SSD
应用相同的过程然后取总M
并根据值更改任意乘数How can I convert the xG in each match to prob_1 like array and call it into the function above?
i.e. prob_1...prob_10000. Here's sample of xG.
individual_match_xG.tail()
xG_Team1 xG_Team2
0 1.440539 1.380095
1 2.123673 0.946116
2 1.819697 0.921660
3 1.132676 1.375717
4 1.244837 1.269933
,直到达到最佳结果。
**问题**
* There are 10000 Final Score's with xG that I want to turn into 10000 prob_1. Then get an SSD for each.
* K is Total timeslote per match and is constant depending on the length of the intervals. For 30 sec timeslots, k is 180. Plus 7/2 mints of injuy time, k=187.
* Home_Goal, Away_Goal and No_Goal represents the prob. of a single goal scored per timeslot by the respective Team or No goal being scored.
* Only one Goal can be scored per timeslot.
总之,
{{1}}
答案 0 :(得分:1)
import numpy as np
# constants
M = 1.0
k = 180 # number of timeslots
x1 = [1,0,0] # prob. of No goal scored per timeslot.
x2 = [0,1,0] # prob. of Home Team scoring per timeslot.
x3 = [0,0,1] # prob. of Away Team scoring per timeslot.
# seven scores
final_scores = [[2,1],[3,3],[1,2],[1,1],[2,1],[4,0],[2,3]]
# time slots with goals
Home_Goal = [2, 3]
Away_Goal = [4]
# numpy arrays of the data
final_scores = np.array(final_scores) # team_1 is [:,0], team_2 is [:,1]
home_goal = np.array(Home_Goal)
away_goal = np.array(Away_Goal)
# fudge factor
adj_scores = final_scores * M # shape --> (# of scores, 2)
# calculate prob_1
slot_goal_probability = adj_scores / k # xG_n / k
slot_draw_probability = 1 - slot_goal_probability.sum(axis = 1) #1-(xG_1+xG_2)/k
# y for all scores
y = np.concatenate((slot_draw_probability[:,None], slot_goal_probability), axis=1)
# ssd for x2, x3, x1
home_ssd = np.sum(np.square(x2 - y), axis=1)
away_ssd = np.sum(np.square(x3 - y), axis=1)
draw_ssd = np.sum(np.square(x1 - y), axis=1)
ssd = np.zeros((y.shape[0],k))
ssd += draw_ssd[:,None] # all time slices a draw
ssd[:,home_goal] = home_ssd[:,None] # time slots with goal for home games
ssd[:,away_goal] = away_ssd[:,None] # time slots with goal for away games
每个分数的概率总和(在您的示例中为prob_1):
>>> y.sum(axis=1)
array([1., 1., 1., 1., 1., 1., 1.])
ssd
的形状是(分数#,180) - 它保留所有分数的时隙概率。
>>> ssd.sum(axis=1)
array([5.92222222, 6. , 5.93333333, 5.93333333, 5.92222222,
5.95555556, 5.96666667])
>>> for thing in ssd.sum(axis=1):
print(thing)
5.922222222222222
6.000000000000001
5.933333333333332
5.933333333333337
5.922222222222222
5.955555555555557
5.966666666666663
>>>
使用您的函数测试y
:
>>> y
array([[0.98333333, 0.01111111, 0.00555556],
[0.96666667, 0.01666667, 0.01666667],
[0.98333333, 0.00555556, 0.01111111],
[0.98888889, 0.00555556, 0.00555556],
[0.98333333, 0.01111111, 0.00555556],
[0.97777778, 0.02222222, 0. ],
[0.97222222, 0.01111111, 0.01666667]])
>>> for prob in y:
print(sum(sum_squared_diff(prob, x1, x2, x3)))
5.922222222222252
6.000000000000045
5.933333333333363
5.933333333333391
5.922222222222252
5.955555555555599
5.966666666666613
>>>
有些希望是微小的差异。我将它们降低到1e-14范围内的浮点或舍入误差。
也许有人会看到这一点,并在他们自己的答案中进行更多优化。一旦我解决了这个问题,我就没有进一步改进。
Numpy Basics:
Indexing
Broadcasting