我正在从头开始使用Pima Indians糖尿病数据集从头开始创建简单的神经网络,可以从UCI机器学习库下载。当我运行我的代码时,错误率总是相同的每次迭代我都不知道为什么会发生这种情况但是如果我使用XOR作为数据它可以正常工作。
这是我的代码
## Load Dependencies
import numpy as np
from sklearn.preprocessing import MinMaxScaler
## Seeding to reproduce random generated results
np.random.seed(1)
## We take input (X) and output (y)
data = np.loadtxt('diabetes.txt', delimiter=',')
scaler = MinMaxScaler()
scaler.fit(data)
data = scaler.transform(data)
X = data[:,0:8]
y = data[:,8].reshape(768,1)
## Define our activation function, in our case we will use sigmoid function: 1 / (1 + exp(-x))
def sigmoid(x, deriv=False):
if(deriv == True):
return x * (1 - x)
return 1 / (1 + np.exp(-x))
## Initialize weights with random values
wh = 2 * np.random.random((8, 768)) - 1
wo = 2 * np.random.random((768, 1)) - 1
# Training time
for i in range(1000):
## Forward propagation
h0 = X
## input * weigth + bias , activate
h1 = sigmoid(np.dot(h0,wh))
outl = sigmoid(np.dot(h1,wo))
## Compute the error of the predicted output layer to the actual result
errorout = y - outl
## Compute the slope (Gradient/Derivative) of hidden and output layers Gradient of sigmoid can be returned as x * (1 – x).
## Compute change factor(delta) at output layer,
## dependent on the gradient of error multiplied by the slope of output layer activation
deltaoutl = errorout * sigmoid(outl,deriv=True)
## At this step, the error will propagate back into the network which means error at hidden layer.
## For this, we will take the dot product of output layer delta with weight parameters of edges
## between the hidden and output layer (wout.T).
errorh1 = np.dot(deltaoutl,wo.T)
## Compute change factor(delta) at hidden layer, multiply the error at hidden layer with slope of hidden layer activation
deltah1 = errorh1 * sigmoid(h1,deriv=True)
## Print error values
if i % 10000:
print("Error :" + str(np.mean(np.abs(errorout))))
## Update weights at the output and hidden layer:
## The weights in the network can be updated from the errors calculated for training example(s).
wh += np.dot(h0.T,deltah1)
wo += np.dot(h1.T,deltaoutl)
结果是:
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
...
如果我们将数据更改为:
X = np.array([[0,0],
[0,1],
[1,0],
[1,1]])
y = np.array([[0],
[1],
[1],
[0]])
wh = 2 * np.random.random((2,4)) - 1
wo = 2 * np.random.random((4,1)) - 1
它的工作方式应该如此。我不明白为什么会发生这种情况请有人开导我谢谢。