解剖代码

Question

我正在尝试使用numpy为我的神经网络设置反向传播，但是由于某些原因，当我为保存我的输出权重的矩阵设置梯度体面方程时，矩阵的其中两个（2,5）梯度体面方程中的（5,1）不会一起广播。我做错了吗？

我试图将方程分解为不同的部分，以查看是否还有其他原因可能导致这种情况，但到目前为止，我一直将其指向具体的分子中的整个矩阵以及整个矩阵分母（梯度体面方程是一个分数）。我还认为这可能发生在原始输出权重和梯度体面方程之间，但这也是错误的，因为输出权重的矩阵不是（2,5），而是（5,2）。我还尝试了numpy.divide以外的函数，例如使用numpy.dot将第一个方程乘以第二个幂至-1。

解剖代码

self.outputWeights = self.outputWeights - l * 

#numarator
( -numpy.divide((2 * (numpy.dot(y.reshape(self.outputs, 1), (1+numpy.power(e, -n-b))).reshape(self.neurons, self.outputs)-w)).reshape(self.outputs, self.neurons), 

#denominator
(numpy.power(1+ numpy.power(e, -n-b), 2)).reshape(self.neurons, 1)))

实际代码

n = self.HIDDEN[self.layers]
b = self.bias[self.layers]
w = self.outputWeights

self.outputWeights = self.outputWeights - l * ( -numpy.divide((2 * (numpy.dot(y.reshape(self.outputs, 1), (1+numpy.power(e, -n-b))).reshape(self.neurons, self.outputs)-w)).reshape(self.outputs, self.neurons), (numpy.power(1+ numpy.power(e, -n-b), 2)).reshape(self.neurons, 1)))

我希望由于第一个矩阵的列和第二个矩阵的行具有相同的大小，因此不会有问题。

Answer 1

对于矩阵乘积dot，规则为last dim of A pairs with 2nd to the last dim of B：

In [136]: x=np.arange(10).reshape(5,2); y=np.arange(2)[:,None]                       
In [137]: x.shape, y.shape                                                           
Out[137]: ((5, 2), (2, 1))
In [138]: x.dot(y)                                                                   
Out[138]: 
array([[1],
       [3],
       [5],
       [7],
       [9]])
In [139]: _.shape                                                                    
Out[139]: (5, 1)

inner 2的匹配项，结果为（5,1）。

但是对于元素级操作，例如*（相乘），除法和求和，这些尺寸不起作用

In [140]: x*y                                                                        
---------------------------------------------------------------------------
ValueError: operands could not be broadcast together with shapes (5,2) (2,1)

y的转置有效：

In [141]: x*y.T                                                                      
Out[141]: 
array([[0, 1],
       [0, 3],
       [0, 5],
       [0, 7],
       [0, 9]])

这是因为y.T具有形状（1,2）。通过广播可以与（5,2）配对以生成（5,2）数组的规则。尺寸1的尺寸可以扩展以匹配x的5。

这些矩阵为什么不一起广播？ ValueError：操作数不能与形状（5,2）（2,1）一起广播

解剖代码

实际代码

1 个答案: