MatchId ExpectedGoals_Team1 ExpectedGoals_Team2 Timestamp Stages Home Away
0 698085 0.8585339288573895 1.4819072820614578 2016-08-13 11:30:00 0 [92, 112] [94]
1 698086 1.097064295289673 1.0923520385902274 2016-09-12 14:00:00 0 [] [164]
2 698087 1.2752442136224664 0.8687263006179976 2016-11-25 14:00:00 1 [90] [147]
3 698088 1.0571269856980154 1.4323522262211752 2016-02-16 14:00:00 2 [10, 66, 101] [50, 118]
4 698089 1.2680212913301165 0.918961072480616 2016-05-10 14:00:00 2 [21] [134, 167]
这是需要根据分类列“阶段” 来更新结果的功能。
x1 = np.array([1, 0, 0])
x2 = np.array([0, 1, 0])
x3 = np.array([0, 0, 1])
total_timeslot = 196
m=1
def squared_diff(row):
ssd = []
Home = row.Home
Away = row.Away
y = np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
for k in range(total_timeslot):
if k in Home:
ssd.append(sum((x2 - y) ** 2))
elif k in Away:
ssd.append(sum((x3 - y) ** 2))
else:
ssd.append(sum((x1 - y) ** 2))
return sum(ssd)
sum(df.apply(squared_diff, axis=1))
For m=1, Out[400]: 7636.305551658377
通过为m
中的每个类别分配任意值Stages
,我想测试一个成本函数。 Let m1 = 2, m2 = 3.
这是我的尝试。
def stages(row):
Stages = row.Stages
if Stages == 0:
return np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
elif Stages == 1:
return np.array([1 - (row.ExpectedGoals_Team1*m1 + row.ExpectedGoals_Team2*m1), row.ExpectedGoals_Team1*m1, row.ExpectedGoals_Team2*m1])
else:
return np.array([1 - (row.ExpectedGoals_Team1*m2 + row.ExpectedGoals_Team2*m2), row.ExpectedGoals_Team1*m2, row.ExpectedGoals_Team2*m2])
df.apply(squared_diff, Stages, axis=1)
TypeError:apply()为参数'axis'获得了多个值
答案 0 :(得分:2)
df.apply(squared_diff, Stages, axis=1)
出现错误,因为第二个参数用于axis
,所以它认为是axis=Stages
,但是第三个参数还是axis=1
。
要解决该问题,您可以先将所需的m
存储到单独的列中
df['m'] = df.Stages.apply(lambda x: 1 if x == 0 else 2 if x == 1 else 3)
然后在您的squared_diff函数中替换此行
y = np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
使用
y = np.array([1 - (row.ExpectedGoals_Team1*row.m + row.ExpectedGoals_Team2*row.m), row.ExpectedGoals_Team1*row.m, row.ExpectedGoals_Team2*row.m])