提高逻辑迭代的速度Python

时间:2019-03-23 17:14:37

标签: python performance for-loop optimization logic

我正在用Python建立排名算法来对大学篮球队进行排名。该算法的工作原理如下:

  1. 每场大学篮球比赛我都有一个数据框(5464行)(我们将其称为data_for_model
  2. 我每个团队还有另一个数据框(353行)及其当前排名(我们将其称为model_df
  3. 我编写的程序遍历data_for_model的每一行(即每场比赛),并且在每次迭代时,它标识data_for_model中的输球队和获胜队,然后找到两队的排名在model_df中。如果获胜团队的排名相对于败局具有更大的排名(即更差的排名),则获胜团队的排名现在将取决于他们刚刚击败的团队的价值(即获胜团队的排名会提高)。此外,他们刚刚击败的球队以及所有排名低于他们的球队(即更大,更差的排名)的排名都会增加1(即,他们的排名会变差)。

目前,我已经获得了可以在短短2分钟内完全运行的代码。我想看看是否可以加快循环速度,因为理想情况下,我想运行数千次,且初始排名为随机数(1-353),并获得数千次最终排名的平均排名。我是监控流程和优化代码的新手,所以非常感谢您的帮助。

以下是一些可行的示例的虚构数据:

# import dependencies
import pandas as pd

# create winning team list
Winning_Team = ['buffalo','st-johns-ny','seton-hall','providence','indiana']
# create losing team list
Losing_Team = ['saint-francis-pa','loyola-md','wagner','siena','chicago-state']
# put winning team and losing team into columns in data_for_model
data_for_model = pd.DataFrame({'Winning_Team': Winning_Team,
                              'Losing_Team': Losing_Team})

# create team list
Team = ['buffalo','st-johns-ny','seton-hall','providence','indiana',
        'saint-francis-pa','loyola-md','wagner','siena','chicago-state']
# create rank list (i.e., 1-10)
Rank = [10,2,3,4,5,6,7,8,9,1]
# put Team and Rank into columns in model_df
model_df = pd.DataFrame({'Team': Team,
                         'Rank': Rank})

以下是排序算法的代码:

for i in range(data_for_model.shape[0]):
    # set up logic for model
    winning_team = data_for_model['Winning_Team'].loc[i] 
    losing_team = data_for_model['Losing_Team'].loc[i]

    # get index and rank for winning team in model_df
    winning_team_index = model_df.loc[model_df['Team'] == winning_team, 'Random_Rank_{}'.format(models)].index[0]
    winning_team_rank = model_df['Random_Rank_{}'.format(models)].loc[winning_team_index]

    # get index and rank for losing team in model_df
    losing_team_index = model_df.loc[model_df['Team'] == losing_team, 'Random_Rank_{}'.format(models)].index[0]
    losing_team_rank = model_df['Random_Rank_{}'.format(models)].loc[losing_team_index]

    # if the winning team has a worse ranking
    if winning_team_rank > losing_team_rank:
        # increase the ranking by 1 for all random_rank >= losing_team_rank
        model_df['Random_Rank_{}'.format(models)] = model_df.apply(lambda x: x['Random_Rank_{}'.format(models)]+1 if x['Random_Rank_{}'.format(models)] >= losing_team_rank else x['Random_Rank_{}'.format(models)], axis=1)
        # then make the winning_team_rank equal the rank of the losing team
        model_df['Random_Rank_{}'.format(models)].loc[winning_team_index] = losing_team_rank   

0 个答案:

没有答案