我是自学习神经元网络,从感知器开始,我使用python从零开始实施,并使用sklearn的乳腺癌数据训练模型,但每个时期/迭代的模型平均成本并未收敛到零并保持不变平均成本常数,并且该模型无法如预期那样正确预测。
我正尝试在其上使用S形和随机梯度下降方法,但我发现网络上很少有资源可以尝试我现在正在尝试的内容(对不起,如果还有更多我不知道的地方)。
这是感知器模块的代码:
import numpy
class Perceptron:
value = 0.0
total_sum = 0.0
def __init__(self, inputs, weights):
self.total_sum = inputs * weights[1:] + weights[0]
self.value = 1 / (1 + numpy.exp(-self.total_sum))
def sigmoid_derivative_of_intercept(sigmoid_value):
return sigmoid_value * (1 - sigmoid_value)
def sigmoid_derivative_of_slope(input_x, total_sum, sigmoid_value):
return input_x.sum() * numpy.exp(-total_sum) * sigmoid_value**2
对这些衍生词的解释
assumed cost function of the perceptron is (target - predict) = c(w, b)
let sum(wx + b) = total_sum
let weight[0] += learning_rate * -d(c)/d(b)
let weights[1:] += learning_rate * -d(c)/d(w)
let predict = 1 / (1 + e^(-total_sum))
d(c)/d(b)
= d(target - predict)/d(b)
= -d(predict)/d(b)
= -d(1 / (1 + e^(-total_sum)))/d(b)
= -e^(-total_sum) / (1 + e^(-total_sum))^2
= -predict * (1 - predict)
the part where predict * (1 - predict) is for the intercept or bias and minus sign are already cancle out with the weight update formula
d(c)/d(w)
= -d(predict)/d(w)
= -d(1 / (1 + e^(-total_sum)))/d(w)
= -d(1 / (1 + e^(-sum(wx + b))))/d(w)
= -sum(x)e^(-total_sum) / (1 + e^(-total_sum))^2
= -sum(x) * e^(-total_sum) * predict^2
试穿过程
import numpy
import pandas
import matplotlib
imoprt sklearn.datasets as sklearn_datasets
import sklearn.preprocessing as sklearn_preprocessing
# the module or the percetron class above
import perceptron.core as perceptron_core
dataset = sklearn_datasets.load_breast_cancer()
training_data_len_as_percentage = 80
training_data_len = training_data_len_as_percentage * dataset.data.shape[0] / 100
training_data = sklearn_proprocessing.minmax_scale(dataset.data[:training_data_len])
training_target = dataset.target[:training_data_len]
threshold = 10000
bias = 1.0
learning_rate = 0.01
weights = numpy.random.uniform(low=0.0, high=1.0, size=(training_data_len + 1))
average_errors = []
for _ in range(threshold):
total_error_in_one_epoch = 0.0
# NOTE: for stochastic gradient descent
# the method saids to update weight each training data
for n in range(training_data_len):
predict = perceptron_core.Perceptron(training_data[n], weights)
error = training_target[n] - predict.value
total_error_in_one_epoch += error
weights[0] += learning_rate * perceptron_core.sigmoid_derivative_of_intercept(predict.value)
weights[1:] += learning_rate * perceptron_core.sigmoid_derivative_of_slope(training_data[n], predict.total_sum, predict.value)
average_errors.append(total_error_in_one_epoch / training_data_len)
平均费用图表如下所示:
预测很糟糕。对于其余数据作为测试集,所有预测都预测相同的值。