基于分类值python的更新函数

时间:2018-07-20 10:17:51

标签: python pandas dataframe categories

    MatchId ExpectedGoals_Team1 ExpectedGoals_Team2 Timestamp         Stages        Home              Away
0   698085  0.8585339288573895  1.4819072820614578  2016-08-13 11:30:00  0        [92, 112]            [94]
1   698086  1.097064295289673   1.0923520385902274  2016-09-12 14:00:00  0        []                   [164]
2   698087  1.2752442136224664  0.8687263006179976  2016-11-25 14:00:00  1        [90]                 [147]
3   698088  1.0571269856980154  1.4323522262211752  2016-02-16 14:00:00  2        [10, 66, 101]        [50, 118]
4   698089  1.2680212913301165  0.918961072480616   2016-05-10 14:00:00  2        [21]                 [134, 167]

这是需要根据分类列“阶段” 来更新结果的功能。

x1 = np.array([1, 0, 0])
x2 = np.array([0, 1, 0])
x3 = np.array([0, 0, 1])
total_timeslot = 196
m=1

def squared_diff(row):
    ssd = []
    Home = row.Home
    Away = row.Away
    y = np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
for k in range(total_timeslot):          
    if k in Home:
        ssd.append(sum((x2 - y) ** 2))
    elif k in Away:
        ssd.append(sum((x3 - y) ** 2))
    else:
        ssd.append(sum((x1 - y) ** 2))
return sum(ssd)

sum(df.apply(squared_diff, axis=1)) 
For m=1, Out[400]: 7636.305551658377

通过为m中的每个类别分配任意值Stages,我想测试一个成本函数。 Let m1 = 2, m2 = 3.

这是我的尝试。

def stages(row):
    Stages = row.Stages
    if Stages == 0:
        return np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
    elif Stages == 1:
        return np.array([1 - (row.ExpectedGoals_Team1*m1 + row.ExpectedGoals_Team2*m1), row.ExpectedGoals_Team1*m1, row.ExpectedGoals_Team2*m1])
    else:
        return np.array([1 - (row.ExpectedGoals_Team1*m2 + row.ExpectedGoals_Team2*m2), row.ExpectedGoals_Team1*m2, row.ExpectedGoals_Team2*m2])

df.apply(squared_diff, Stages, axis=1)
  

TypeError:apply()为参数'axis'获得了多个值

1 个答案:

答案 0 :(得分:2)

df.apply(squared_diff, Stages, axis=1)出现错误,因为第二个参数用于axis,所以它认为是axis=Stages,但是第三个参数还是axis=1

要解决该问题,您可以先将所需的m存储到单独的列中

df['m'] = df.Stages.apply(lambda x: 1 if x == 0 else 2 if x == 1 else 3)

然后在您的squared_diff函数中替换此行

y = np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])

使用

y = np.array([1 - (row.ExpectedGoals_Team1*row.m + row.ExpectedGoals_Team2*row.m), row.ExpectedGoals_Team1*row.m, row.ExpectedGoals_Team2*row.m])