Question

我正在尝试一次遍历一个斯诺克比赛的数据帧，以便每次比赛之后，我都可以更新所涉及球员的评分。

我写了一些代码可以达到这个目的，但是它很慢（36000次匹配/行大约10分钟的运行时间）。我感觉这与我使用np.vectorize有关，因为我没有其他方法可以让函数在pandas df上工作以接受多个参数（我的get_match_rating函数-参见下文-接受三个参数））。

matches_fil3是数据框的名称。我正在使用的所有功能都非常简单（基本数学只有一两行）。所以这就是为什么我认为这是一个np.vectorize问题-是否有更快/更Python化的方式来实现这一目标？

for i in range(0, len(matches_fil3)):
    matches_fil3.loc[i, 'P1Est'] = np.vectorize(get_estimate)(matches_fil3['Player One'].iloc[i], i)
    matches_fil3.loc[i, 'P1Err'] = np.vectorize(get_error)(matches_fil3['Player One'].iloc[i], i)
    matches_fil3.loc[i, 'P2Est'] = np.vectorize(get_estimate)(matches_fil3['Player Two'].iloc[i], i)
    matches_fil3.loc[i, 'P2Err'] = np.vectorize(get_error)(matches_fil3['Player Two'].iloc[i], i)
    matches_fil3.loc[i, 'P1Rat'] = np.vectorize(get_match_rating)(matches_fil3['P1 Frames Won'].iloc[i], matches_fil3['Total'].iloc[i], matches_fil3['P1Exp'].iloc[i])
    matches_fil3.loc[i, 'P2Rat'] = np.vectorize(get_match_rating)(matches_fil3['P2 Frames Won'].iloc[i], matches_fil3['Total'].iloc[i], matches_fil3['P2Exp'].iloc[i])

Answer 1

我认为一种更好且更快的方法是使用Apply函数。他们在内部进行了优化以提高性能。

您可以像这样重写代码：

matches_fil3['P1Est'] = matches_fil3['Player One'].apply(get_estimate)
matches_fil3['P1Err'] = matches_fil3['Player One'].apply(get_error)
.
.
matches_fil3['P2Rat'] = matches_fil3.apply(lambda row: get_match_rating(row['P2 Frames Won'], row['Total'], row['P2Exp']), axis=1)

熊猫：np.vectorize缓慢以更新数据框中的值

1 个答案: