Question

嵌套循环非常浪费时间。我有一些想法可以提高效率。想知道是否可以共享更好的选择。

我正在尝试在python中创建一个数据框，以从多个其他数据框提取值。对于少量的变量/列，我可以执行简单的赋值。在下面的示例中，我希望比较两个数据帧中的每个单元，如果相等则进行分配。如果它们不相等，我需要遍历第二个数据帧，直到对每个单元进行评估，然后再进行任何赋值。

“”“遍历第一个数据帧的每一行，然后遍历第二个数据帧。这是为了控制比较列中的值正确匹配。 “”“

for i in range(len(df10)):
    for j in range(len(df6)):                 # this is not an efficient way to perform this action.
        if df10.iloc[i,0] == df6.iloc[j,1]:
            df10.iloc[i,23] = df6.iloc[j,6]
            df10.iloc[i,24] = df6.iloc[j,1]
df10.sample(n=5)

Answer 1

这是您的操作方法，请参阅注释以获取描述。如果不清楚，请发表评论

np.random.seed(10)
df10 = pd.DataFrame(np.random.choice(5, (5,5)))

df6 = pd.DataFrame(np.random.choice(5, (4,6)))

display(df10)
display(df6)

## compare each pair of rows from 0th column of df10 and 1st column of df6
## using numpy broadcast. Which will return matrix of boolean with true at
## element i,j where values are equal
cond = df10.iloc[:,0].values[:,np.newaxis] == df6.iloc[:,1].values

## get matching index in array when the matrix is flatten 
indx = np.arange(cond.size)[cond.ravel()]

## convert flattened index to row and colum index (i,j)
## where i crossponds to row index in df10 and j crossponds to 
## row index in df6
i,j = indx//len(df6), indx%len(df6)

## set value using fancy indexing
df10.iloc[i,3] = df6.iloc[j,4].values
df10

嵌套for循环时间效率低，正在寻找一种明智的选择

1 个答案: