我想在regplot中添加hue
参数,以便我可以识别异常值,并通过颜色将其与图中的其余部分区别,因此在删除了离群值,regplot的变化更加清晰。
# GrLivArea: Above grade (ground) living area square feet
#Use a 68% confidence interval, which corresponds with the standard error of the estimate:
fig = plt.figure(figsize=(10, 10))
ax1 = fig.add_subplot(211)
b = sns.regplot(x = 'GrLivArea', y = 'SalePrice', data = df_data, ax=ax1,
color = 'Green', ci=68, scatter_kws={'alpha':0.3}, line_kws={'color': 'red'})
plt.title ('Ground Living Area VS SalePrice (With Outliers)', fontsize=13)
plt.tight_layout()
# Removing houses which is more than 4000 sq feet
df = df_data.drop(df_data[(df_data['GrLivArea']>4000) & (df_data['SalePrice']<300000)].index)
ax2 = fig.add_subplot(212)
b = sns.regplot(x = 'GrLivArea', y = 'SalePrice', data = df, ax=ax2,
color = 'Green', ci=68, scatter_kws={'alpha':0.3}, line_kws={'color': 'red'})
plt.title ('Ground Living Area VS SalePrice (Outliers Removed)', fontsize=13)
plt.tight_layout()