计算与双变量数据的线y = x的偏差

时间:2017-06-30 19:15:18

标签: python pandas statistics

如下图所示,我有双变量数据,在理想情况下,这些数据符合直线(y = x)。在Python中,如何计算每个点与该线的偏差(y = x)?是否有可能量化该直线拟合的平均偏差?我只想着一种方法来量化我的数据如何从1:1的比例变化。任何建议表示赞赏。我正在使用Pandas DataFrame中的数据设置。感谢。

enter image description here

1 个答案:

答案 0 :(得分:0)

此代码将计算每个点与回归线的偏差以及y = x线的偏差,并且还将绘制标准偏差以及回归和y = x函数。

from sklearn.linear_model import LinearRegression
import statistics as stat


#Set the x and y values
x=np.random.rand(50)
y=2*x-1+np.random.rand(50)



"""
calculate the deviation from y=x at each point
"""

xp=np.linspace(0,1,50)
yp=xp
deviationxy=(y-yp)

listpos=[]
listneg=[]

#Calculate the ratio of the points
[listpos.append(i) for i in deviationxy if i >0]
[listneg.append(i) for i in deviationxy if i <0]

if len(listpos)==len(listneg):
    print("The ratio is 1:1")
else:
    above=(len(listpos)/len(deviationxy))*100
    below=(len(listneg)/len(deviationxy))*100
    print("{0}% of the values are above the line y=x ; {1}% of the values are below the line".format(above,below))


"""
Implement the regression
"""

#coerce the x values in the shape [n_samples,n_features]
X=x[:,np.newaxis]


#inistantiate the model
model=LinearRegression(fit_intercept=True)

#fit the model
model.fit(X,y)


#print the dots and the regression function as well as the fumction x=y
fig,ax=plt.subplots(figsize=(10,10))
ax.scatter(x,y)
ax.plot(x,model.coef_*x+model.intercept_,":r")
ax.plot(xp,yp,".k")


#calculate the devaition from regression at each point
deviation=np.sqrt((y-(model.coef_*x+model.intercept_))**2)
print(deviation)#returns the deviation for each point



#plot the standard_deviation from the regression line

standard_deviation=stat.stdev(x)

std_dev=[standard_deviation,-standard_deviation]
[ax.plot(x,(model.coef_*x+model.intercept_)+standard,"--b") for standard in std_dev]

plt.show()

enter image description here