出于学习目的,我一直在尝试实现自己的玩具神经网络库。我尝试在各种逻辑门操作(例如Or,And和XOR)上对其进行测试。 虽然它对OR运算正常工作,但对AND和XOR运算却失败。它很少能为AND和XOR运算提供正确的输出。
我尝试了范围学习率。我还尝试了各种学习曲线,以找到带有时期数的成本模式。
import numpy as np
class myNeuralNet:
def __init__(self, layers = [2, 2, 1], learningRate = 0.09):
self.layers = layers
self.learningRate = learningRate
self.biasses = [np.random.randn(l, 1) for l in self.layers[1:]]
self.weights = [np.random.randn(i, o) for o, i in zip(self.layers[:-1], self.layers[1:])]
self.cost = []
def sigmoid(self, z):
return (1.0 / (1.0 + np.exp(-z)))
def sigmoidPrime(self, z):
return (self.sigmoid(z) * (1 - self.sigmoid(z)))
def feedForward(self, z, predict = False):
activations = [z]
for w, b in zip(self.weights, self.biasses): activations.append(self.sigmoid(np.dot(w, activations[-1]) + b))
# for activation in activations: print(activation)
if predict: return np.round(activations[-1])
return np.array(activations)
def drawLearningRate(self):
import matplotlib.pyplot as plt
plt.xlim(0, len(self.cost))
plt.ylim(0, 5)
plt.plot(np.array(self.cost).reshape(-1, 1))
plt.show()
def backPropogate(self, x, y):
bigDW = [np.zeros(w.shape) for w in self.weights]
bigDB = [np.zeros(b.shape) for b in self.biasses]
activations = self.feedForward(x)
delta = activations[-1] - y
# print(activations[-1])
# quit()
self.cost.append(np.sum([- y * np.log(activations[-1]) - (1 - y) * np.log(1 - activations[-1])]))
for l in range(2, len(self.layers) + 1):
bigDW[-l + 1] = (1 / len(x)) * np.dot(delta, activations[-l].T)
bigDB[-l + 1] = (1 / len(x)) * np.sum(delta, axis = 1)
delta = np.dot(self.weights[-l + 1].T, delta) * self.sigmoidPrime(activations[-l])
for w, dw in zip(self.weights, bigDW): w -= self.learningRate * dw
for b, db in zip(self.biasses, bigDB): b -= self.learningRate *db.reshape(-1, 1)
return np.sum(- y * np.log(activations[-1]) - (1 - y) * np.log(1 - activations[-1])) / 2
if __name__ == '__main__':
nn = myNeuralNet(layers = [2, 2, 1], learningRate = 0.35)
datasetX = np.array([[1, 1], [0, 1], [1, 0], [0, 0]]).transpose()
datasetY = np.array([[x ^ y] for x, y in datasetX.T]).reshape(1, -1)
print(datasetY)
# print(nn.feedForward(datasetX, predict = True))
for _ in range(60000): nn.backPropogate(datasetX, datasetY)
# print(nn.cost)
print(nn.feedForward(datasetX, predict = True))
nn.drawLearningRate()
有时还会给出“ RuntimeWarning:exp中遇到溢出”,有时会导致收敛失败。
答案 0 :(得分:0)
从头开始制作神经网络时,我遇到了同样的问题。我通过使用解决了
scipy.special.expit(x)
而不是np.exp(x)让我知道它是否对您有用!
答案 1 :(得分:0)
对于交叉熵错误,您需要在网络上具有一个概率输出层才能正确工作。乙状结肠通常不起作用,也不应真正使用。
您的公式似乎有点偏离。对于当前的网络布局,您已经定义:3层(2、2、1),您有w0(2x2)和w1(1x2)。记住要找到dw1,您需要:
d1 = (guess - target) * sigmoid_prime(net_inputs[1]) <- when you differentiated da2/dz1 you ended up f'(z1) and not f'(a2)!
dw1 = d1 * activations[1]
db1 = np.sum(d1, axis=1)
d0 = d1 * w1 * sigmoid_prime(net_inputs[0])
dw0 = d0 * activations[0]
db0 = np.sum(d0, axis=1)
要记住的是,每一层的net_inputs为
z:= w @ x + b
和激活
a:= f(z)
。在反向传播期间,当您计算da [i] / dz [i-1]时,您需要将激活函数的导数应用于z [i-1]而不是a [i]。
z = w @ x + b
a = f(z)
da / dz = f'(z)!!!
这适用于所有层。一些小注意事项:
如果不对输出层使用soft / hardmax激活函数,则将错误计算切换为:np.mean(.5 *(activations [-1]-y)** 2)。单输出神经元,为什么?)。
在增量计算过程中在激活函数的导数中使用z-s
不要使用Sigmoid(就消失的梯度而言是有问题的),请尝试ReLu:np.where(x <= 0,0,x)/np.where(x <= 0,0, 1)或它的某些变体。
对于XOR的学习率,使用任何一种优化方法在[.0001,.1]之间进行选择应该绰绰有余。
如果将权重矩阵初始化为:[number_of_input_units x number_of_output_units]而不是现在的[number_of_output_units x number_of_input_units],则可以将z = w @ x + b更改为z = x @ w + b和您将不需要转换您的输入和输出。
以下是上述示例的实现:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
def cost(guess, target):
return np.mean(np.sum(.5 * (guess - target)**2, axis=1), axis=0)
datasetX = np.array([[0., 0.], [0., 1.], [1., 0.], [1., 1.]])
datasetY = np.array([[0.], [1.], [1.], [0.]])
w0 = np.random.normal(0., 1., size=(2, 4))
w1 = np.random.normal(0., 1., size=(4, 1))
b0 = np.zeros(4)
b1 = np.zeros(1)
f1 = lambda x: np.where(x <= 0, 0, x)
df1 = lambda d: np.where(d <= 0, 0, 1)
f2 = lambda x: np.where(x <= 0, .1*x, x)
df2 = lambda d: np.where(d <= 0, .1, 1)
costs = []
for i in range(250):
a0 = datasetX
z0 = a0 @ w0 + b0
a1 = f1(z0)
z1 = a1 @ w1 + b1
a2 = f2(z1)
costs.append(cost(a2, datasetY))
d1 = (a2 - datasetY) * df2(z1)
d0 = d1 @ w1.T * df1(z0)
dw1 = a1.T @ d1
db1 = np.sum(d1, axis=0)
dw0 = a0.T @ d0
db0 = np.sum(d0, axis=0)
w0 = w0 - .1 * dw0
b0 = b0 - .1 * db0
w1 = w1 - .1 * dw1
b1 = b1 - .1 * db1
print(f2(f1(datasetX @ w0 + b0) @ w1 + b1))
plt.plot(costs)
plt.show()
它给出的结果:
[[0.00342399]
[0.99856158]
[0.99983358]
[0.00156524]]