离群值分析方法和箱线图

时间:2020-10-16 18:57:10

标签: python data-analysis boxplot outliers

z_score = (train_1['MonthlyRevenue'] - train_1['MonthlyRevenue'].mean()) / train_1['MonthlyRevenue'].std()

(z_score>3).sum(), (z_score<-3).sum()

upper = train_1[z_score<=3]["MonthlyRevenue"].max()

lower = train_1[z_score>=-3]["MonthlyRevenue"].min()

upper,lower

train_1_zscore_replaced = train_1.copy()

train_1_zscore_replaced["MonthlyRevenue"][z_score>3] = upper

train_1_zscore_replaced["MonthlyRevenue"][z_score<-3] = lower

train_1_zscore_replaced.head(30)

我有58个变量的列。

1-我该怎么做才能在箱线图中显示所有列以进行离群值分析?

2-我怎么知道选择哪种方法(z分数,硬边,IQR ..)进行离群分析?

数据集: [1]:https://www.kaggle.com/jpacse/datasets-for-churn-telecom

0 个答案:

没有答案