在fmin_cg中使用向后传播

时间:2017-09-11 23:10:24

标签: python scipy

我正在尝试在python中构建一个ANN,并且我已经能够直接传递,但是当我尝试向后传播时遇到问题。在我的函数nnCostFunction中,渐变grad定义为:

grad = tr(c_[Theta1_grad.swapaxes(1,0).reshape(1,-1), Theta2_grad.swapaxes(1,0).reshape(1,-1)])

但这是一个问题,因为我使用scipy.optimize.fmin_cg来计算nn_params和cost,而fmin_cg只接受一个值(我的前向传递的J值)并且不能接受grad ...

nn_params, cost = op.fmin_cg(lambda t: nnCostFunction(t, input_layer_size, hidden_layer_size, num_labels, X, y, lam), initial_nn_params, gtol = 0.001, maxiter = 40, full_output=1)[0, 1]

有没有办法解决这个问题,所以我可以在我的网络中包含反向传播?我知道有一个scipy.optimize.minimize函数,但是我很难理解如何使用它并得到我需要的结果。有谁知道需要做什么?

非常感谢您的帮助,谢谢。

def nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lam):
    '''

    Given NN parameters, layer sizes, number of labels, data, and learning rate, returns the cost of traversing NN.
    '''

    Theta1 = (reshape(nn_params[:(hidden_layer_size*(input_layer_size+1))],(hidden_layer_size,(input_layer_size+1))))

    Theta2 = (reshape(nn_params[((hidden_layer_size*(input_layer_size+1))):],(num_labels, (hidden_layer_size+1))))

    m = X.shape[0]
    n = X.shape[1]

    #forward pass
    y_eye = eye(num_labels)
    y_new = np.zeros((y.shape[0],num_labels))

    for z in range(y.shape[0]):
        y_new[z,:] = y_eye[int(y[z])-1]

    y = y_new

    a_1 = c_[ones((m,1)),X]
    z_2 = tr(Theta1.dot(tr(a_1)))

    a_2 = tr(sigmoid(Theta1.dot(tr(a_1))))
    a_2 = c_[ones((a_2.shape[0],1)), a_2]

    a_3 = tr(sigmoid(Theta2.dot(tr(a_2))))

    J_reg = lam/(2.*m) * (sum(sum(Theta1[:,1:]**2)) + sum(sum(Theta2[:,1:]**2)))

    J = (1./m) * sum(sum(-y*log(a_3) - (1-y)*log(1-a_3))) + J_reg

    #Backprop

    d_3 = a_3 - y

    d_2 = d_3.dot(Theta2[:,1:])*sigmoidGradient(z_2)

    Theta1_grad = 1./m * tr(d_2).dot(a_1)
    Theta2_grad = 1./m * tr(d_3).dot(a_2)

    #Add regularization

    Theta1_grad[:,1:] = Theta1_grad[:,1:] + lam*1.0/m*Theta1[:,1:]
    Theta2_grad[:,1:] = Theta2_grad[:,1:] + lam*1.0/m*Theta2[:,1:]

    #Unroll gradients
    grad = tr(c_[Theta1_grad.swapaxes(1,0).reshape(1,-1), Theta2_grad.swapaxes(1,0).reshape(1,-1)])

    return J, grad


def nn_train(X,y,lam = 1.0, hidden_layer_size = 10):
    '''

    Train neural network given the features and class arrays, learning rate, and size of the hidden layer.
    Return parameters Theta1, Theta2.
    '''

    # NN input and output layer sizes
    input_layer_size = X.shape[1]
    num_labels = unique(y).shape[0] #output layer

    # Initialize NN parameters
    initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size)
    initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels)


    # Unroll parameters
    initial_nn_params = np.append(initial_Theta1.flatten(1), initial_Theta2.flatten(1))
    initial_nn_params = reshape(initial_nn_params,(len(initial_nn_params),)) #flatten into 1-d array


    # Find and print initial cost:
    J_init = nnCostFunction(initial_nn_params,input_layer_size,hidden_layer_size,num_labels,X,y,lam)[0]
    grad_init = nnCostFunction(initial_nn_params,input_layer_size,hidden_layer_size,num_labels,X,y,lam)[1]
    print 'Initial J cost: ' + str(J_init)
    print 'Initial grad cost: ' + str(grad_init)

    # Implement backprop and train network, run fmin
    print 'Training Neural Network...'
    print 'fmin results:'

    nn_params, cost = op.fmin_cg(lambda t: nnCostFunction(t, input_layer_size, hidden_layer_size, num_labels, X, y, lam), initial_nn_params, gtol = 0.001, maxiter = 40, full_output=1)[0, 1]



    Theta1 = (reshape(nn_params[:(hidden_layer_size*(input_layer_size+1))],(hidden_layer_size,(input_layer_size+1))))

    Theta2 = (reshape(nn_params[((hidden_layer_size*(input_layer_size+1))):],(num_labels, (hidden_layer_size+1))))

    return Theta1, Theta2

0 个答案:

没有答案