我正在尝试一次遍历一个斯诺克比赛的数据帧,以便每次比赛之后,我都可以更新所涉及球员的评分。
我写了一些代码可以达到这个目的,但是它很慢(36000次匹配/行大约10分钟的运行时间)。我感觉这与我使用np.vectorize有关,因为我没有其他方法可以让函数在pandas df上工作以接受多个参数(我的get_match_rating函数-参见下文-接受三个参数) )。
matches_fil3是数据框的名称。我正在使用的所有功能都非常简单(基本数学只有一两行)。所以这就是为什么我认为这是一个np.vectorize问题-是否有更快/更Python化的方式来实现这一目标?
for i in range(0, len(matches_fil3)):
matches_fil3.loc[i, 'P1Est'] = np.vectorize(get_estimate)(matches_fil3['Player One'].iloc[i], i)
matches_fil3.loc[i, 'P1Err'] = np.vectorize(get_error)(matches_fil3['Player One'].iloc[i], i)
matches_fil3.loc[i, 'P2Est'] = np.vectorize(get_estimate)(matches_fil3['Player Two'].iloc[i], i)
matches_fil3.loc[i, 'P2Err'] = np.vectorize(get_error)(matches_fil3['Player Two'].iloc[i], i)
matches_fil3.loc[i, 'P1Rat'] = np.vectorize(get_match_rating)(matches_fil3['P1 Frames Won'].iloc[i], matches_fil3['Total'].iloc[i], matches_fil3['P1Exp'].iloc[i])
matches_fil3.loc[i, 'P2Rat'] = np.vectorize(get_match_rating)(matches_fil3['P2 Frames Won'].iloc[i], matches_fil3['Total'].iloc[i], matches_fil3['P2Exp'].iloc[i])
答案 0 :(得分:0)
我认为一种更好且更快的方法是使用Apply函数。他们在内部进行了优化以提高性能。
您可以像这样重写代码:
matches_fil3['P1Est'] = matches_fil3['Player One'].apply(get_estimate)
matches_fil3['P1Err'] = matches_fil3['Player One'].apply(get_error)
.
.
matches_fil3['P2Rat'] = matches_fil3.apply(lambda row: get_match_rating(row['P2 Frames Won'], row['Total'], row['P2Exp']), axis=1)