我是Data Mining / ML的新手。我一直试图解决从给定输入参数预测价格的多项式回归问题(已经在范围[0,1]内标准化)
我非常接近,因为我的输出与正确的输出成比例,但似乎有点抑制,我的算法是正确的,只是不知道如何达到一个合适的lambda,(正则化)参数)以及如何决定我应该在多大程度上填充功能,因为问题是:"每平方英尺的价格是(近似)特征的多项式函数。此多项式的顺序始终小于4"。
我们是否有办法可视化数据以找到这些参数的最佳值,例如我们通过使用梯度下降在线性回归中可视化成本函数来找到最佳alpha(步长)和迭代次数。
这是我的代码:http://ideone.com/6ctDFh
from numpy import *
def mapFeature(X1, X2):
degree = 2
out = ones((shape(X1)[0], 1))
for i in range(1, degree+1):
for j in range(0, i+1):
term1 = X1**(i-j)
term2 = X2 ** (j)
term = (term1 * term2).reshape( shape(term1)[0], 1 )
"""note that here 'out[i]' represents mappedfeatures of X1[i], X2[i], .......... out is made to store features of one set in out[i] horizontally """
out = hstack(( out, term ))
return out
def solve():
n, m = input().split()
m = int(m)
n = int(n)
data = zeros((m, n+1))
for i in range(0, m):
ausi = input().split()
for k in range(0, n+1):
data[i, k] = float(ausi[k])
X = data[:, 0 : n]
y = data[:, n]
theta = zeros((6, 1))
X = mapFeature(X[:, 0], X[:, 1])
ausi = computeCostVect(X, y, theta)
# print(X)
print("Results usning BFGS : ")
lamda = 2
theta, cost = findMinTheta(theta, X, y, lamda)
test = [0.05, 0.54, 0.91, 0.91, 0.31, 0.76, 0.51, 0.31]
print("prediction for 0.31 , 0.76 (using BFGS) : ")
for i in range(0, 7, 2):
print(mapFeature(array([test[i]]), array([test[i+1]])).dot( theta ))
# pyplot.plot(X[:, 1], y, 'rx', markersize = 5)
# fig = pyplot.figure()
# ax = fig.add_subplot(1,1,1)
# ax.scatter(X[:, 1],X[:, 2], s=y) # Added third variable income as size of the bubble
# pyplot.show()
目前的输出是:
183.43478288
349.10716957
236.94627602
208.61071682
正确的输出应该是:
180.38
1312.07
440.13
343.72