在numpy中优化平方差异(SSD)的总和

时间:2018-06-10 14:00:36

标签: python arrays numpy optimization

我试图通过测量与各个比赛时段的平方差的总和来优化足球(足球)比赛中的预期目标。假设每场比赛被分成k个时段,并且任意球队或无目标得分都有不变的目标概率。

**Sample SSD for individual match_i with Final score [0-0]**
xG is unique in each match. 
Team1 and Team2 has the following xG multiplied by arbitrary multiplier M.

Team1 = xG_1*M
Team2 = xG_2*M
prob_1 = [1-(xG_1 + xG_2)/k, xG_1/k, xG_2/k].

其中Prob_1DrawTeam1 GoalTeam2 Goal的概率,每个时段(k)match_isum(prob_1) = 1 SSD {1}}。

match_i衡量x1 = [1,0,0] #; prob. of No goal scored per timeslot. x2 = [0,1,0] #; prob. of Home Team scoring per timeslot. x3 = [0,0,1] #; prob. of Away Team scoring per timeslot. y = np.array([1-(xG_1 + xG_2)/k, xG_1/k, xG_2/k]) # Using xG_Team1 and xG_Team2 from table below. total_timeslot = 180 Home_Goal = [] # No Goal scored Away_Goal = [] # Np Goal scored def sum_squared_diff(x1, x2, x3, y): ssd=[] for k in range(total_timeslot): if k in Home_Goal: ssd.append( sum((x2 - y)**2)) elif k in Away_Goal: ssd.append(sum((x3 - y)**2)) else: ssd.append(sum((x1 - y)**2)) return ssd SSD_Result = sum_squared_diff(x1, x2, x3, y) sum(SSD_Result)

xGs

例如,使用下表index 0M = 1的{​​{1}}和First, for k = 187 timeslot, xG per timeslot becomes 1.4405394105672238/187, 1.3800950382265837/187 and are constant throughout the match. y_0 = np.array([1-(0.007703419308 + 0.007380187370)/187, 0.007703419308/187, 0.007380187370/187]) Using y_0 in the function above, SSD_Result for xG at index 0 is 1.8252675137316426e-06.

SSD

正如xG figure那样看起来很有希望,但随后比赛结束时没有进球,两支球队几乎相同xG index 1, xG index 2....xG index 10000. .... 现在我想对SSD应用相同的过程然后取总M并根据值更改任意乘数How can I convert the xG in each match to prob_1 like array and call it into the function above? i.e. prob_1...prob_10000. Here's sample of xG. individual_match_xG.tail() xG_Team1 xG_Team2 0 1.440539 1.380095 1 2.123673 0.946116 2 1.819697 0.921660 3 1.132676 1.375717 4 1.244837 1.269933 ,直到达到最佳结果。

**问题**

* There are 10000 Final Score's with xG that I want to turn into 10000 prob_1. Then get an SSD for each. 
* K is Total timeslote per match and is constant depending on the length of the intervals. For 30 sec timeslots, k is 180. Plus 7/2 mints of injuy time, k=187. 
* Home_Goal, Away_Goal and No_Goal  represents the prob. of a single goal scored per timeslot by the respective Team or No goal being scored. 
* Only one Goal can be scored per timeslot.

总之,

{{1}}

1 个答案:

答案 0 :(得分:1)

import numpy as np
# constants
M = 1.0
k = 180    # number of timeslots
x1 = [1,0,0] # prob. of No goal scored per timeslot.
x2 = [0,1,0] # prob. of Home Team scoring per timeslot.
x3 = [0,0,1] # prob. of Away Team scoring per timeslot.    

# seven scores
final_scores = [[2,1],[3,3],[1,2],[1,1],[2,1],[4,0],[2,3]]

# time slots with goals
Home_Goal = [2, 3]
Away_Goal = [4]

# numpy arrays of the data
final_scores = np.array(final_scores)    # team_1 is [:,0], team_2 is [:,1]
home_goal = np.array(Home_Goal)
away_goal = np.array(Away_Goal)

# fudge factor
adj_scores = final_scores * M    # shape --> (# of scores, 2)

# calculate prob_1
slot_goal_probability = adj_scores / k    # xG_n / k
slot_draw_probability = 1 - slot_goal_probability.sum(axis = 1)    #1-(xG_1+xG_2)/k

# y for all scores
y = np.concatenate((slot_draw_probability[:,None], slot_goal_probability), axis=1)


# ssd for x2, x3, x1
home_ssd = np.sum(np.square(x2 - y), axis=1)
away_ssd = np.sum(np.square(x3 - y), axis=1)
draw_ssd = np.sum(np.square(x1 - y), axis=1)

ssd = np.zeros((y.shape[0],k))
ssd += draw_ssd[:,None]    # all time slices a draw
ssd[:,home_goal] = home_ssd[:,None]    # time slots with goal for home games 
ssd[:,away_goal] = away_ssd[:,None]    # time slots with goal for away games

每个分数的概率总和(在您的示例中为prob_1):

>>> y.sum(axis=1)
array([1., 1., 1., 1., 1., 1., 1.])

ssd的形状是(分数#,180) - 它保留所有分数的时隙概率。

>>> ssd.sum(axis=1)
array([5.92222222, 6.        , 5.93333333, 5.93333333, 5.92222222,
       5.95555556, 5.96666667])
>>> for thing in ssd.sum(axis=1):
    print(thing)

5.922222222222222
6.000000000000001
5.933333333333332
5.933333333333337
5.922222222222222
5.955555555555557
5.966666666666663
>>>

使用您的函数测试y

>>> y
array([[0.98333333, 0.01111111, 0.00555556],
       [0.96666667, 0.01666667, 0.01666667],
       [0.98333333, 0.00555556, 0.01111111],
       [0.98888889, 0.00555556, 0.00555556],
       [0.98333333, 0.01111111, 0.00555556],
       [0.97777778, 0.02222222, 0.        ],
       [0.97222222, 0.01111111, 0.01666667]])
>>> for prob in y:
    print(sum(sum_squared_diff(prob, x1, x2, x3)))

5.922222222222252
6.000000000000045
5.933333333333363
5.933333333333391
5.922222222222252
5.955555555555599
5.966666666666613
>>>

有些希望是微小的差异。我将它们降低到1e-14范围内的浮点或舍入误差。

也许有人会看到这一点,并在他们自己的答案中进行更多优化。一旦我解决了这个问题,我就没有进一步改进。

Numpy Basics:
Indexing
Broadcasting