Question

我有两个数据帧，'matches_df'和'ratings_df'。比赛数据帧存储双人游戏的玩家，日期和比赛的获胜者。评级数据框存储每个玩家的当前评级，从任意值开始。我想更新此框架，然后再重新设置它。

matches_df

date | player_1 | player_2 | winner
1/11    'A'         'B'        'A'
2/11    'C'         'B'        'C'
3/11    'A'         'D'        'A'
4/11    'A'         'C'        'C'

ratings_df

player | rating
'A'       1000
'B'       1000
'C'       1000
'D'       1000

我有一个算法更新评级，执行以下操作（sudocode）。

def update_ratings(match,parameter):
    #(1) use current ratings to predict the likelihood of either player winning the match 
    #(2) using the outcome of the match to update player ratings 
    #(3) update the two players current ratings in the global dataframe based on the result of the match. 
    #(4) Return the square of the forecast's prediction error.

我想比较模型预测精度中不同参数值的性能。但是，我很难复制“评级”数据帧或重置函数调用之间的评级数据框。我使用以下代码来计算给定参数值的性能：

def calc_brier(parameter,matches_df):
    #reset dataframe to initial values (1000 for all players)
    start_ratings = np.repeat(1000.0,len(unique_players))
    ratings_df = pd.DataFrame(data=[start_ratings],columns=unique_players)
    brier = 0
    for index, row in matches_df.iterrows():
        brier += update_ratings(row,parameter)
    return brier

但是，这并没有给出正确的结果。调用'calc_brier'函数时不会重置全局评级数据帧，因此如果使用相同的参数多次调用，我的calc_brier函数会不一致。我应该怎样做才能在调用'calc_brier'之前/之后正确地重置全局评级数据帧，或者使用替代结构来实现我比较不同参数值的性能的最终目标？

Answer 1

如果我使用字典而不是数据框来存储评级，那么它是有效的。这是有效的版本（评级df现在是一个字典，其名称为键，评级为1000处开始的值）。不确定原始代码有什么问题。

def calc_brier(parameter):
    for player in unique_players:
        ratings_dict[player]=1000.0
    brier = 0
    for index, row in matches_df.iterrows():
        brier += update_ratings(row,k_factor)
    return brier

python重置函数调用之间的可变全局数据帧

1 个答案: