我有一个包含两列的大型数据框和一个从每行获取值并迭代数据帧的函数。下面是数据框的负责人。
xG_Team1 xG_Team2
0 1.440539 1.380095
1 2.123673 0.946116
2 1.819697 0.921660
3 1.132676 1.375717
4 1.244837 1.269933
x1, x2, x3 are constants.
x1 = [1,0,0]
x2 = [0,1,0]
x3 = [0,0,1]
For index 0,
y = np.array([1-(xG_Team1[0] + xG_Team2[0])/k, xG_Team1[0]/k, xG_Team2[0]/k])
i.e. y = np.array([1-(1.440539 + 1.380095)/k, 1.440539/k, 1.380095/k])
For index 1,
y = np.array([1-(xG_Team1[1] + xG_Team2[1])/k, xG_Team1[1]/k, xG_Team2[1]/k])
k
是total_timeslot
和常量。
total_timeslot = 180
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal
def sum_squared_diff(x1, x2, x3, y):
ssd=[]
for k in range(total_timeslot):
if k in Home_Goal:
ssd.append( sum((x2 - y)**2))
elif k in Away_Goal:
ssd.append(sum((x3 - y)**2))
else:
ssd.append(sum((x1 - y)**2))
return ssd
y_0 = sum_squared_diff(x1, x2, x3, y)
计划是对所有y的sum_squared_diff
的输出求和。
像for all i sum(y_i).
So for i = 0,
y_0 = sum_squared_diff(x1, x2, x3, y_0)
len(y_0) = 180
sum(y_0) = 0.0663099498972334
Then I will have n numbers of sum(y_i) for n xGs.
using @Dillon code, for the above datframe, n=5
sum(results.sum()) = 0.31885730707076826
答案 0 :(得分:2)
data = {'xG_Team1': {0: 1.440539, 1: 2.123673, 2: 1.819697, 3: 1.132676, 4: 1.244837},
'xG_Team2': {0: 1.380095, 1: 0.946116, 2: 0.92166, 3: 1.375717, 4: 1.269933}}
df = pd.DataFrame(data)
x1 = [1,0,0]
x2 = [0,1,0]
x3 = [0,0,1]
# Constants
total_timeslot = 180
k = 180
# Measures
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal
def sum_squared_diff(x1, x2, x3, y):
ssd = []
for k in range(total_timeslot): # k will take multiple values
if k in Home_Goal:
ssd.append(sum((x2 - y) ** 2))
elif k in Away_Goal:
ssd.append(sum((x3 - y) ** 2))
else:
ssd.append(sum((x1 - y) ** 2))
return ssd
def my_function(row):
xG_Team1 = row.xG_Team1
xG_Team2 = row.xG_Team2
return np.array([1-(xG_Team1 + xG_Team2)/k, xG_Team1/k, xG_Team2/k])
# You can use the apply function
results = df.apply(lambda row: sum_squared_diff(x1, x2, x3, my_function(row)), axis=1)
# Each item in results is a 180 item list
results
Out[]:
0 [0.0003683886105401867, 0.0003683886105401867,...
1 [0.0004576767592872215, 0.0004576767592872215,...
2 [0.00036036396694006056, 0.0003603639669400605...
3 [0.00029220949467635905, 0.0002922094946763590...
4 [0.00029279065228265494, 0.0002927906522826549...
# For each list, calculate the sum
results.map(lambda x: sum(x))
Out[]:
0 0.066310
1 0.082382
2 0.064866
3 0.052598
4 0.052702
# Get the sum of all these values
results.map(lambda x: sum(x)).sum()
Out[]:
0.3188573070707662