由于这使用了sigmoid函数而不是零/一激活函数,我猜这是计算梯度下降的正确方法,是吗?
static double calculateOutput( int theta, double weights[], double[][] feature_matrix, int file_index, int globo_dict_size )
{
//double sum = x * weights[0] + y * weights[1] + z * weights[2] + weights[3];
double sum = 0.0;
for (int i = 0; i < globo_dict_size; i++)
{
sum += ( weights[i] * feature_matrix[file_index][i] );
}
//bias
sum += weights[ globo_dict_size ];
return sigmoid(sum);
}
private static double sigmoid(double x)
{
return 1 / (1 + Math.exp(-x));
}
以下代码我正在尝试更新我的Θ值,(相当于感知器中的权重,不是吗?),我在related question为此目的给出了这个公式LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]
}。我评论了感知器的重量更新。
这个新的更新规则是正确的方法吗?
output_gradient是什么意思?这相当于我在calculateOutput
方法中计算的总和吗?
//LEARNING WEIGHTS
double localError, globalError;
int p, iteration, output;
iteration = 0;
do
{
iteration++;
globalError = 0;
//loop through all instances (complete one epoch)
for (p = 0; p < number_of_files__train; p++)
{
// calculate predicted class
output = calculateOutput( theta, weights, feature_matrix__train, p, globo_dict_size );
// difference between predicted and actual class values
localError = outputs__train[p] - output;
//update weights and bias
for (int i = 0; i < globo_dict_size; i++)
{
//weights[i] += ( LEARNING_RATE * localError * feature_matrix__train[p][i] );
weights[i] += LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]
}
weights[ globo_dict_size ] += ( LEARNING_RATE * localError );
//summation of squared error (error value for all instances)
globalError += (localError*localError);
}
/* Root Mean Squared Error */
if (iteration < 10)
System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
else
System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
//System.out.println( Arrays.toString( weights ) );
}
while(globalError != 0 && iteration<=MAX_ITER);
更新 现在我已经更新了东西,看起来更像是这样:
double loss, cost, hypothesis, gradient;
int p, iteration;
iteration = 0;
do
{
iteration++;
cost = 0.0;
loss = 0.0;
//loop through all instances (complete one epoch)
for (p = 0; p < number_of_files__train; p++)
{
// 1. Calculate the hypothesis h = X * theta
hypothesis = calculateHypothesis( theta, feature_matrix__train, p, globo_dict_size );
// 2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
loss = hypothesis - outputs__train[p];
// 3. Calculate the gradient = X' * loss / m
gradient = calculateGradent( theta, feature_matrix__train, p, globo_dict_size, loss );
// 4. Update the parameters theta = theta - alpha * gradient
for (int i = 0; i < globo_dict_size; i++)
{
theta[i] = theta[i] - (LEARNING_RATE * gradient);
}
}
//summation of squared error (error value for all instances)
cost += (loss*loss);
/* Root Mean Squared Error */
if (iteration < 10)
System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( cost/number_of_files__train ) );
else
System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( cost/number_of_files__train ) );
//System.out.println( Arrays.toString( weights ) );
}
while(cost != 0 && iteration<=MAX_ITER);
}
static double calculateHypothesis( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size )
{
double hypothesis = 0.0;
for (int i = 0; i < globo_dict_size; i++)
{
hypothesis += ( theta[i] * feature_matrix[file_index][i] );
}
//bias
hypothesis += theta[ globo_dict_size ];
return hypothesis;
}
static double calculateGradent( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size, double loss )
{
double gradient = 0.0;
for (int i = 0; i < globo_dict_size; i++)
{
gradient += ( feature_matrix[file_index][i] * loss);
}
return gradient;
}
public static double hingeLoss()
{
// l(y, f(x)) = max(0, 1 − y · f(x))
return HINGE;
}
答案 0 :(得分:1)
您的calculateOutput
方法看起来是正确的。你的下一段代码我真的不这么认为:
weights[i] += LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]
查看您在other question中发布的图片:
让我们尝试在代码中识别这些规则的每个部分。
Theta0 and
Theta1:在您的代码中看起来像weights[i]
;我希望globo_dict_size = 2
;
alpha
:似乎是您的LEARNING_RATE
;
1 / m
:我在您的更新规则中找不到任何内容。 m
是Andrew Ng视频中的培训实例数。在你的情况下,我应该1 / number_of_files__train
;但这并不是很重要,即使没有它,事情也应该很好。
总和:您使用calculateOutput
函数执行此操作,您在localError
变量中使用其结果,该变量乘以feature_matrix__train[p][i]
(相当于{{ 1:}在Andrew Ng的表示法中)。
这部分是你的偏导数,是渐变的一部分!
为什么呢?因为x(i)
相对于[h_theta(x(i)) - y(i)]^2
的偏导数等于:
Theta0
当然,你应该得出全部总和。这也是Andrew Ng使用2*[h_theta(x(i)) - y(i)] * derivative[h_theta(x(i)) - y(i)]
derivative[h_theta(x(i)) - y(i)] =
derivative[Theta0 * x(i, 1) + Theta1*x(i, 2) - y(i)] =
x(i, 1)
作为成本函数的原因,因此1 / (2m)
将取消我们从推导得到的2
。
请注意,2
或x(i, 1)
应包含所有内容。在您的代码中,您应该确保:
x(1)
就是这样!我不知道你的代码应该是feature_matrix__train[p][0] == 1
,你没有在任何地方定义它。
我建议您查看this tutorial以更好地了解您使用的算法。由于您使用sigmoid函数,您似乎想要进行分类,但是您应该使用不同的成本函数。该文件也涉及逻辑回归。