Python,熊猫:矢量化的机会+避免嵌套循环?

时间:2018-07-16 15:50:45

标签: python pandas

以下是我的代码,当前使用两个循环在由外部循环定义的num个迭代上处理输入df,并与内部循环内部生成的随机数字序列进行比较。

虽然当前的方法可以正确地给我输出,但我怀疑这可以用更好的方式完成,特别是对于外循环中的迭代次数超过几百万且df中的num列接近一百的情况

我想知道我是否可以尝试实现一两个技巧。

# Input df - index is same length as num iterations for inner loop defined below
# 'cumuluative' column value is used for comparison against random number inside inner loop 
# 'units_A' is useful data captured from each iteration of inner loop that is aggregated after exiting inner loop
df_reference = pd.DataFrame(index=np.arange(1,11,1),data={'cumulative':np.arange(0.1,1.1,0.1),'units_A':np.arange(10,101,10)})

# Variable that determines num rows in output df
num_iterations_outer = 20
# Variable that determines number of iterations for inner loop operation
num_iterations_inner = 10
# Create an empty output df that will be updated at end
df_out = pd.DataFrame(columns=['cumulative','units_A'])

# Using np array for comparison inside loop instead of comparing against column which takes much longer
compare_against_arr = df_reference['cumulative'].values
# Create a list to store df's that will become rows of output df. This is done to store to list and concat once vs. concat each df at a time within loop
output_df_rows_list = []

for outer_iteration_num in np.arange(num_iterations_outer):
    #current_cumulative_val = 1
    # Rotation num is reset to 1 at the start of every outer interation
    current_rotation_num = 1
    # Create an empty list to store all rotation_num that are generated from inner loop iteration
    rotations_list = []
    for inner_iteration_num in np.arange(1,num_iterations_inner+1):
        # Get a random number between (0.0,1.0]
        comparator = np.random.random()
        # Add the current rotation num to the list created before entering inner loop. Use the rotations list to get corresponding units_A after exiting inner loop
        rotations_list.append(current_rotation_num)
        # Compare random num 'comparator' to cumulative value corresponding to current rotation
        if(comparator < compare_against_arr[current_rotation_num]):
            # Reset rotation_num back to 1
            current_rotation_num = 1
        else:
            # Increment rotation_num
            current_rotation_num += 1
    df_units_A_by_rotation = df_reference.reindex(rotations_list)
    df_units_A_agg_outer_iter = pd.DataFrame(data=df_units_A_by_rotation.sum()).transpose()    
    output_df_rows_list.append(df_units_A_agg_outer_iter)

#  Output df is created by concatenating all df stored in list that was updated in outer loop above
df_out = pd.concat(output_df_rows_list)
# Reset index so that it matches num_outer_iterations
df_out.index = np.arange(num_iterations_outer)

感谢您的宝贵时间,并感谢您的关注!

0 个答案:

没有答案