Question

我正在尝试为具有任意激活功能的完全连接层实现反向传播方法。我理解算法背后的一般思想和数学，但是我对理解矢量化形式有困难......

我需要帮助理解元素的预期尺寸

已知尺寸：

输入 - self.X是大小（N，128）
重量 - 自我.W是大小（128,10）
偏见 - self.b的大小（128,10）
输出 - self.y是大小（N，10）
线性输出（激活前） - self.z是大小（N，10）

未知尺寸 N = 1 （示例数量）

dy - 下一层的渐变 - 应该是多大？
dz - 激活函数的衍生物 - 应该是多大？
self.d - 当前图层的渐变 - 应该是多大？

这是我的代码：

Table        | Relationship        | Table
-------------------------------------------------
Users        | one-to-many         | Participants
Groups       | one-to-many         | Participants
Participants | many-to-many        | Roles
             | (Participant_Roles) |
Answers      | one-to-one          | Participant_Roles (replace your Answers.participant_id with this)
Answers      | one-to-one          | Question
Tickets      | many-to-many        | Question
             | (Ticket_Questions)  |

Answer 1

一对错误：

width的尺寸Stacked subplots不应为self.b（因为偏差是每单位，而不是每单位对）。
self.b is size (10, )应该是(128, 10)，而不是self.W_grad。 np.dot(self.X.T, (dz * dy))也是如此 - 它应该是np.dot(self.X.T, dy)

至于其余的

self.b_grad应为np.sum(dz * dy, axis=0)，因为它包含y中每个元素的损失梯度。

对于元素激活函数，

dy := dL/dy应为(N, 10)，因为dz := df(z)/d(z)包含(N, 10)。

dz[i]应为df(z[i])/dz[i]，因为它包含相对于X中每个元素的损失渐变。

实现一般的反向传播

1 个答案: