Question

我是自学习神经元网络，从感知器开始，我使用python从零开始实施，并使用sklearn的乳腺癌数据训练模型，但每个时期/迭代的模型平均成本并未收敛到零并保持不变平均成本常数，并且该模型无法如预期那样正确预测。

我正尝试在其上使用S形和随机梯度下降方法，但我发现网络上很少有资源可以尝试我现在正在尝试的内容（对不起，如果还有更多我不知道的地方）。

这是感知器模块的代码：

import numpy

class Perceptron:
    value = 0.0
    total_sum = 0.0

    def __init__(self, inputs, weights):
      self.total_sum = inputs * weights[1:] + weights[0]
      self.value = 1 / (1 + numpy.exp(-self.total_sum))


def sigmoid_derivative_of_intercept(sigmoid_value):
    return sigmoid_value * (1 - sigmoid_value)


def sigmoid_derivative_of_slope(input_x, total_sum, sigmoid_value):
    return input_x.sum() * numpy.exp(-total_sum) * sigmoid_value**2

对这些衍生词的解释

assumed cost function of the perceptron is (target - predict) = c(w, b)

let sum(wx + b) = total_sum
let weight[0] += learning_rate * -d(c)/d(b)
let weights[1:] += learning_rate * -d(c)/d(w)
let predict = 1 / (1 + e^(-total_sum))

d(c)/d(b)
  = d(target - predict)/d(b)
  = -d(predict)/d(b)
  = -d(1 / (1 + e^(-total_sum)))/d(b)
  = -e^(-total_sum) / (1 + e^(-total_sum))^2
  = -predict * (1 - predict)

the part where predict * (1 - predict) is for the intercept or bias and minus sign are already cancle out with the weight update formula

d(c)/d(w)
  = -d(predict)/d(w)
  = -d(1 / (1 + e^(-total_sum)))/d(w)
  = -d(1 / (1 + e^(-sum(wx + b))))/d(w)
  = -sum(x)e^(-total_sum) / (1 + e^(-total_sum))^2
  = -sum(x) * e^(-total_sum) * predict^2

试穿过程

import numpy
import pandas
import matplotlib
imoprt sklearn.datasets as sklearn_datasets
import sklearn.preprocessing as sklearn_preprocessing

# the module or the percetron class above
import perceptron.core as perceptron_core

dataset = sklearn_datasets.load_breast_cancer()

training_data_len_as_percentage = 80
training_data_len = training_data_len_as_percentage * dataset.data.shape[0] / 100

training_data = sklearn_proprocessing.minmax_scale(dataset.data[:training_data_len])
training_target = dataset.target[:training_data_len]

threshold = 10000
bias = 1.0
learning_rate = 0.01
weights = numpy.random.uniform(low=0.0, high=1.0, size=(training_data_len + 1))

average_errors = []

for _ in range(threshold):
    total_error_in_one_epoch = 0.0

    # NOTE: for stochastic gradient descent
    # the method saids to update weight each training data
    for n in range(training_data_len):
        predict = perceptron_core.Perceptron(training_data[n], weights)
        error = training_target[n] - predict.value

        total_error_in_one_epoch += error

        weights[0] += learning_rate * perceptron_core.sigmoid_derivative_of_intercept(predict.value)
        weights[1:] += learning_rate * perceptron_core.sigmoid_derivative_of_slope(training_data[n], predict.total_sum, predict.value)

    average_errors.append(total_error_in_one_epoch / training_data_len)

平均费用图表如下所示：

average cost chart

预测很糟糕。对于其余数据作为测试集，所有预测都预测相同的值。

如何使用具有S型和随机梯度下降的感知器？

0 个答案: