Question

是的，这个问题已经问了很多遍了！不，我仍然无法弄清楚如何运行此布尔值过滤器而不生成Pandas SettingWithCopyWarning警告。

for x in range(len(df_A)):
    df_C = df_A.loc[(df_A['age'] >= df_B['age_limits'].iloc[x][0]) &
                    (df_A['age'] <= df_B['age_limits'].iloc[x][1])]

    df_D['count'].iloc[x] = len(df_C) # triggers warning

我尝试过：

在所有可能的位置复制df_A和df_B
使用口罩
使用查询

我知道我可以取消警告，但是我不想这样做。

我想念什么？我知道这可能很明显。

非常感谢！

Answer 1

有关为何获得 SettingWithCopyWarning 的更多详细信息，建议您阅读this answer。这主要是因为选择列df_D['count']，然后使用iloc[x]会进行“链接的分配” ，这种方式已被标记。

为防止这种情况，您可以在df_D中获得所需列的位置，然后在循环iloc中将for用于行和列：

pos_col_D = df_D.columns.get_loc['count']
for x in range(len(df_A)):
    df_C = df_A.loc[(df_A['age'] >= df_B['age_limits'].iloc[x][0]) &
                    (df_A['age'] <= df_B['age_limits'].iloc[x][1])]

    df_D.iloc[x,pos_col_D ] = len(df_C) #no more warning

此外，因为您将df_A.age的所有值与df_B.age_limits的边界进行了比较，所以我认为您可以使用numpy.ufunc.outer和ufunc来提高代码的速度分别为greater_equal和less_egal，然后在轴= 0上sum。

#Setup
import numpy as np
import pandas as pd
df_A = pd.DataFrame({'age': [12,25,32]})
df_B = pd.DataFrame({'age_limits':[[3,99], [20,45], [15,30]]})

#your result
for x in range(len(df_A)):
    df_C = df_A.loc[(df_A['age'] >= df_B['age_limits'].iloc[x][0]) &
                    (df_A['age'] <= df_B['age_limits'].iloc[x][1])]
    print (len(df_C))
3
2
1

#with numpy
print ( ( np.greater_equal.outer(df_A.age, df_B.age_limits.str[0])
         & np.less_equal.outer(df_A.age, df_B.age_limits.str[1]))
        .sum(0) )
array([3, 2, 1])

，因此您可以直接在df_D['count']中分配前一行代码，而无需循环for。希望这项工作对您有用

另一个Pandas SettingWithCopyWarning问题

1 个答案: