给定一个包含3列的数据框df
(例如:'Country'
,'Car'
和'Price'
),如何检查与均值相差3个标准偏差的异常值,分别为每个国家和汽车。以下代码有效,但效率不高。
sd = pd.DataFrame()
for country in df['Country'].unique():
for car in df['Car'].unique():
chunk = df[(df['Country']==country) & (df['Car']==car)]
chunk['outlier'] = (np.abs(chunk['Price']-chunk['Price'].mean())) > 3*chunk['Price'].std()
sd = pd.concat([sd, chunk])