I'm attempting to complete code for backpropagation, and the final step that I have is computing the change in weights and biases (using a quadrati cost). This step involves performing a matrix multiplication on two arrays after transposing one.
# necessary functions for this example
def sigmoid(z):
return 1.0/(1.0+np.exp(-z))
def prime(z):
return sigmoid(z) * (1-sigmoid(z))
def cost_derivative(output_activations, y):
return (output_activations-y)
# Mock weight and bias matrices
weights = [np.array([[ 1, 0, 2],
[2, -1, 0],
[4, -1, 0],
[1, 3, -2],
[0, 0, -1]]),
np.array([[2, 0, -1, -1, 2],
[0, 2, -1, -1, 0]])]
biases = [np.array([-1, 2, 0, 0, 4]), np.array([-2, 1])]
# The mock training example
q = [(np.array([1, -2, 3]), np.array([0, 1])),
(np.array([2, -3, 5]), np.array([1, 0])),
(np.array([3, 6, -1]), np.array([1, 0])),
(np.array([4, -1, -1]), np.array([0, 0]))]
nabla_b = [np.zeros(b.shape) for b in biases]
nabla_w = [np.zeros(w.shape) for w in weights]
for x, y in q:
activation = x
activations = [x]
zs = []
for w, b in zip(weights, biases):
z = np.dot(w, activation) + b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
# Computation of last layer
delta = cost_derivative(activations[-1], y) * prime(zs[-1])
nabla_b[-1] = delta
nabla_w[-1] = np.dot(np.transpose(activations[-2]), delta) + biases
I've printed the outputs for delta
and the first instance gives [ 0.14541528 -0.14808645]
which is a 1x2 matrix and
activations[-2] = [9.97527377e-01 9.97527377e-01 9.97527377e-01 1.67014218e-05 7.31058579e-01]
which is a 1x5 matrix. Now transposing activations[-2]
should give a 1x5 and the resulting multiplication should yield a 5x2 matrix but doesn't