Question

我实现了一个简单的功能，该功能旨在输出包含要删除Pandas DataFrame列的名称，以消除高相关性。该函数将DataFrame的相关矩阵作为一个且唯一的参数，该矩阵预先通过Pandas DataFrame对象的.corr（）方法计算，并且本身就是DataFrame对象。我的问题是python在循环中的奇怪行为。

经过几次尝试，我发现问题出在，变量“ col”应该遍历“ corr”列的索引对象，该索引对象包含原始DataFrame的列名称（字符串），这一事实，实际上假设为整数值。这会导致错误，因为.loc方法仅接受列和索引（作为字符串）的实际名称来检索元素，而不是数字。我真的不明白这种行为，因为我尝试使用相同的sintax进行列表理解，并且生成的是实际名称，而不是数字。

功能如下：

（请注意，try-except语句仅用于dubug目的，以显示出现错误时'col'，'ind'和'i'假定的值）

def f(corr):
    for (i,ind) in enumerate(corr.index[:-1]):
        if ind in var_to_drop:
            continue
        for col in corr.columns[i+1:]:
            try:
                if abs(corr.loc[ind,col])>abs(threshold) and not col in var_to_drop
                    var_to_drop.append(col)
            except:
                return [(i),(ind,col)]

这是调用该函数时的输出：

[0, ('lowlevel.average_loudness', 0)]

因此很明显变量'col'假定为整数值，但这不是预期的。

另一方面，列表理解会产生预期的结果：

输入：

[col for col in corr.columns[2340:]]   ###2340 is an arbitrary choice

输出：

['rhythm.bpm_histogram_first_peak_weight.mean_exp',
'rhythm.bpm_histogram_first_peak_weight.median_exp',
'rhythm.bpm_histogram_first_peak_weight.min_exp',
'rhythm.danceability_sin',
'tonal.hpcp_entropy.mean_exp',
'tonal.hpcp_entropy.mean_cub']

for循环迭代中难以理解的python行为

0 个答案: