如下图所示,我有双变量数据,在理想情况下,这些数据符合直线(y = x)。在Python中,如何计算每个点与该线的偏差(y = x)?是否有可能量化该直线拟合的平均偏差?我只想着一种方法来量化我的数据如何从1:1的比例变化。任何建议表示赞赏。我正在使用Pandas DataFrame中的数据设置。感谢。
答案 0 :(得分:0)
此代码将计算每个点与回归线的偏差以及y = x线的偏差,并且还将绘制标准偏差以及回归和y = x函数。
from sklearn.linear_model import LinearRegression
import statistics as stat
#Set the x and y values
x=np.random.rand(50)
y=2*x-1+np.random.rand(50)
"""
calculate the deviation from y=x at each point
"""
xp=np.linspace(0,1,50)
yp=xp
deviationxy=(y-yp)
listpos=[]
listneg=[]
#Calculate the ratio of the points
[listpos.append(i) for i in deviationxy if i >0]
[listneg.append(i) for i in deviationxy if i <0]
if len(listpos)==len(listneg):
print("The ratio is 1:1")
else:
above=(len(listpos)/len(deviationxy))*100
below=(len(listneg)/len(deviationxy))*100
print("{0}% of the values are above the line y=x ; {1}% of the values are below the line".format(above,below))
"""
Implement the regression
"""
#coerce the x values in the shape [n_samples,n_features]
X=x[:,np.newaxis]
#inistantiate the model
model=LinearRegression(fit_intercept=True)
#fit the model
model.fit(X,y)
#print the dots and the regression function as well as the fumction x=y
fig,ax=plt.subplots(figsize=(10,10))
ax.scatter(x,y)
ax.plot(x,model.coef_*x+model.intercept_,":r")
ax.plot(xp,yp,".k")
#calculate the devaition from regression at each point
deviation=np.sqrt((y-(model.coef_*x+model.intercept_))**2)
print(deviation)#returns the deviation for each point
#plot the standard_deviation from the regression line
standard_deviation=stat.stdev(x)
std_dev=[standard_deviation,-standard_deviation]
[ax.plot(x,(model.coef_*x+model.intercept_)+standard,"--b") for standard in std_dev]
plt.show()