Seaborn:盒子图上的标签异常值

时间:2018-01-23 14:21:31

标签: python matplotlib seaborn boxplot

我有一个python片段来创建一个盒子图如下(效果很好):

merged = group.merge(t, left_on=t['user_lower'], right_on=group['user'], how="left")
g = sns.boxplot(x="Company", y="Total_Activities",data=merged, orient="v" )
g.set_xticklabels(g.get_xticklabels(),rotation=90)
plt.show(g)

我在other posts读到这涉及到迭代异常值。有没有人对使用Seaborn的合并数据集有一个这样的例子?

1 个答案:

答案 0 :(得分:0)

我使用此变通办法来获取箱形图坐标轴中离群值的x坐标,可以根据需要对它们进行标记。通过以与sns箱图相同的方式选择离群值,可以找到数据框索引

import seaborn as sns

tips = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", hue="smoker",
                 data=tips, palette="Set3")

plt_outliers_xy = []
for line in ax.get_lines():
    x_data,y_data = line.get_data()
    if line.get_marker() != 'd' or len(y_data) == 0:
        continue
    for x_val,y_val in zip(x_data,y_data):
        plt_outliers_xy.append((x_val,y_val))

grp = tips.groupby(['day','smoker'])

for name, df in grp:
    print(name)
    y_vals = df["total_bill"]
    Q1 = y_vals.quantile(0.25)
    Q3 = y_vals.quantile(0.75)
    IQR = Q3 - Q1    #IQR is interquartile range. 
    iqr_filter = (y_vals >= Q1 - 1.5 * IQR) & (y_vals <= Q3 + 1.5 *IQR) 
    dropped = y_vals.loc[~iqr_filter]
    for index,y_i in dropped.iteritems():
        x_plt, y_plt =  plt_outliers_xy.pop(0)
        print(f"{index} : {y_i:.4f} - {y_plt:.4f} = {y_i-y_plt:.4f}")
#        ax.plot(x_plt, y_plt,'ro')
        ax.annotate(f"{index}",(x_plt, y_plt),(10,10), textcoords = 'offset pixels')
    print()

每个分组数据的离群值可通过以下方法获得: https://datascience.stackexchange.com/questions/54808/how-to-remove-outliers-using-box-plot

或: Extract outliers from Seaborn Boxplot

或: https://nextjournal.com/schmudde/how-to-remove-outliers-in-data

绘图结果: Seaborn box plot with annotated outliers