我有一个python片段来创建一个盒子图如下(效果很好):
merged = group.merge(t, left_on=t['user_lower'], right_on=group['user'], how="left")
g = sns.boxplot(x="Company", y="Total_Activities",data=merged, orient="v" )
g.set_xticklabels(g.get_xticklabels(),rotation=90)
plt.show(g)
我在other posts读到这涉及到迭代异常值。有没有人对使用Seaborn的合并数据集有一个这样的例子?
答案 0 :(得分:0)
我使用此变通办法来获取箱形图坐标轴中离群值的x坐标,可以根据需要对它们进行标记。通过以与sns箱图相同的方式选择离群值,可以找到数据框索引
import seaborn as sns
tips = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", hue="smoker",
data=tips, palette="Set3")
plt_outliers_xy = []
for line in ax.get_lines():
x_data,y_data = line.get_data()
if line.get_marker() != 'd' or len(y_data) == 0:
continue
for x_val,y_val in zip(x_data,y_data):
plt_outliers_xy.append((x_val,y_val))
grp = tips.groupby(['day','smoker'])
for name, df in grp:
print(name)
y_vals = df["total_bill"]
Q1 = y_vals.quantile(0.25)
Q3 = y_vals.quantile(0.75)
IQR = Q3 - Q1 #IQR is interquartile range.
iqr_filter = (y_vals >= Q1 - 1.5 * IQR) & (y_vals <= Q3 + 1.5 *IQR)
dropped = y_vals.loc[~iqr_filter]
for index,y_i in dropped.iteritems():
x_plt, y_plt = plt_outliers_xy.pop(0)
print(f"{index} : {y_i:.4f} - {y_plt:.4f} = {y_i-y_plt:.4f}")
# ax.plot(x_plt, y_plt,'ro')
ax.annotate(f"{index}",(x_plt, y_plt),(10,10), textcoords = 'offset pixels')
print()
每个分组数据的离群值可通过以下方法获得: https://datascience.stackexchange.com/questions/54808/how-to-remove-outliers-using-box-plot
或: Extract outliers from Seaborn Boxplot
或: https://nextjournal.com/schmudde/how-to-remove-outliers-in-data