想直接在python中实现岭多项式回归,不使用sklearn等相关库。直接计算由下式给出:
w = (XTX + lambda * I)^-1*XTy。
代码如下:
import numpy as np
import matplotlib.pyplot as plt
from openpyxl import load_workbook
wb = load_workbook('data.xlsx')
data = wb['data']
xv=[]
yv=[]
for i in range(1,100):
xv = xv +[float(data.cell(row=i,column=1).value)]
yv = yv +[float(data.cell(row=i,column=2).value)]
n=5 #polynomial degree
ex=1
m=len(xv)
max=xv[0]
for u in range(1,m):
if xv[u]>max:
max=xv[u]
x=[]
for i in range(0,m):
xn=[]
for j in range(0,n+1):
xn=xn+[xv[i]**j]
x=x+[xn]
lam = 5
X=np.array(x)
XtX=(X.T).dot(X)
#XtX_inv=np.linalg.inv(XtX)
Xty=(X.T).dot(np.array(yv))
I = np.identity(XtX.shape[0])
LI = np.dot(lam, I)
XtXLI = np.add(XtX, LI)
XtXLI_inv = XtX_inv=np.linalg.inv(XtXLI)
teta=XtXLI_inv.dot(Xty)
def h(c):
h=0
for i in range(0,n+1):
h=h+teta[i]*c**i
return h
hv=[]
for i in range(0,m):
hv=hv+[h(xv[i])]
预计可以通过调整 lambda 参数来实现更好的拟合。但是,随着 lambda 的增加,误差会显着增加。我该如何解决问题?
答案 0 :(得分:1)
您可以检查我使用以下公式制作的此实现:A^T * A * x = A^T * B
import numpy as np
import matplotlib.pyplot as plt
DATA = np.array([(-5, 12), (-3, 2), (-2, -7), (-1, -4), (2, 3), (3, 1), (5, 4), (7, 9)])
n, m = DATA.shape
def regression(degree: int):
A = np.empty(shape=(n, degree + 1))
for i, data in enumerate(DATA):
# Evaluates the polynomial in order to get coefficients
A[i] = np.array([data[0]**x for x in range(degree + 1)])
# @ is a special python operator which performs matrix multiplication
x = A.T @ A
y = A.T @ np.array([d[1] for d in DATA])
# Solves the linear system
r = np.linalg.solve(x, y)
# Evaluates in order to plot values
x = np.linspace(DATA[0][0], DATA[-1][0], num=1000)
y = np.array([np.sum(np.array([r[i]*(j**i) for i in range(len(r))])) for j in x])
# Plots the polynomial
plt.plot(x, y)
# Plots the data points
for data in DATA:
plt.scatter(*data)
# y has to be recalculated because linspace creates extra values in order to plot the graph
y = np.array([np.sum(np.array([r[i] * (d[0] ** i) for i in range(len(r))])) for d in DATA])
error = sum([abs(DATA[i][1]-y[i])**2 for _ in range(n)])**0.5
# If the error is too small, it is depreciated
if error > 1e-10:
plt.title(f"Degree: {degree}, Error: {error}")
else:
plt.title(f"Degree: {degree}, Perfect aproximation")
plt.show()
for i in range(1, n):
regression(i)
答案 1 :(得分:0)
这取决于您所谈论的错误类型。岭回归是一种对多项式回归进行正则化的方法。超参数 lambda(或 alpha)用于控制您想要对模型进行多少正则化。如果增加 lambda,则会增加模型的正则化:您的模型在训练数据上的表现会更差,但在测试数据上的表现会更好(它会更好地泛化)
不要忘记缩放你的数据,因为岭回归对输入特征的尺度很敏感。