我正在构建一个神经网络并尝试使用scipy.optimize中的最小化函数来优化theta参数。
有趣的事情正在发生。当我使用前1000行数据构建网络时(使用nrows = 1000),最小化函数可以正常工作。但是,当我更改nrows = 2000时,我运行的代码将返回我在下面的链接中截断的输出。
代码:
fmin = minimize(fun=backprop, x0=params, args=(input_size, hidden_size, num_labels, X, y, learning_rate),
method='TNC', jac=True, options={'maxiter': 250})
Screenshot of Output with nrows=2000
由于我使用的是反向传播和截断的牛顿算法,我认为可能是因为我的数据中有NA值。因此,我开始运行:
train.fillna(train.mean())
但这仍然导致了上面的输出。知道为什么吗?供您参考,下面是我的反向传播函数和nrows = 1000的输出屏幕截图。
def backprop(params, input_size, hidden_size, num_labels, X, y, learning_rate):
m = X.shape[0]
X = np.matrix(X)
y = np.matrix(y)
# reshape the parameter array into parameter matrices for each layer
theta1 = np.matrix(np.reshape(params[:hidden_size * (input_size + 1)], (hidden_size, (input_size + 1))))
theta2 = np.matrix(np.reshape(params[hidden_size * (input_size + 1):], (num_labels, (hidden_size + 1))))
# run the feed-forward pass
a1, z2, a2, z3, h = forward_propagate(X, theta1, theta2)
# initializations
J = 0
delta1 = np.zeros(theta1.shape) # (25, 401)
delta2 = np.zeros(theta2.shape) # (10, 26)
# compute the cost
for i in range(m):
first_term = np.multiply(-y[i,:], np.log(h[i,:]))
second_term = np.multiply((1 - y[i,:]), np.log(1 - h[i,:]))
J += np.nansum(first_term - second_term)
J = J / m
# add the cost regularization term
J += ((learning_rate) / (2 * m)) * (np.nansum(np.power(theta1[:,1:], 2)) + np.nansum(np.power(theta2[:,1:], 2)))
# perform backpropagation
for t in range(m):
a1t = a1[t,:] # (1, 401)
z2t = z2[t,:] # (1, 25)
a2t = a2[t,:] # (1, 26)
ht = h[t,:] # (1, 10)
yt = y[t,:] # (1, 10)
d3t = ht - yt # (1, 10)
z2t = np.insert(z2t, 0, values=np.ones(1)) # (1, 26)
d2t = np.multiply((theta2.T * d3t.T).T, sigmoid_gradient(z2t)) # (1, 26)
delta1 = delta1 + (d2t[:,1:]).T * a1t
delta2 = delta2 + d3t.T * a2t
delta1 = delta1 / m
delta2 = delta2 / m
# add the gradient regularization term
delta1[:,1:] = delta1[:,1:] + (theta1[:,1:] * learning_rate) / m
delta2[:,1:] = delta2[:,1:] + (theta2[:,1:] * learning_rate) / m
# unravel the gradient matrices into a single array
grad = np.concatenate((np.ravel(delta1), np.ravel(delta2)))
return J, grad
我也知道额外的数据添加了更多的LabelEncoder()和OneHotEncoder()值 - 但是我不太确定那些是否导致错误。任何帮助都会很棒,谢谢!