我正在训练自己学习神经网络。有一个功能,我不能让我的神经网络学习:f(x) = max(x_1, x_2)
。它似乎是一个非常简单的功能,有2个输入和1个输入但是3层神经网络训练了超过一千个样本和2000个时期使它完全错误。我正在使用deeplearning4j
。
有什么理由为什么max函数对于神经网络来说很难学习,或者我只是把它调错了?
答案 0 :(得分:0)
至少,如果你将x1和x2限制在一个区间内,那就不难了,例如在[0,3]之间。从deeplearning4j示例中取出“RegressionSum”示例,我很快将其重写为学习最大值而不是总和,它可以很好地给我结果:
Max(0.6815540048808918,0.3112081053899819) = 0.64
Max(2.0073597506364407,1.93796211086664) = 2.09
Max(1.1792029272560556,2.5514324329058233) = 2.58
Max(2.489185375059013,0.0818746888836388) = 2.46
Max(2.658169689797984,1.419135581889197) = 2.66
Max(2.855509810112818,2.9661811672685086) = 2.98
Max(2.774757710538552,1.3988513143140069) = 2.79
Max(1.5852295273047565,1.1228662895771744) = 1.56
Max(0.8403435207065576,2.5595015474951195) = 2.60
Max(0.06913178775631723,2.61883825802004) = 2.54
以下是我对RegressionSum示例的修改版本,该版本最初来自Anwar 3/15/16:
public class RegressionMax {
//Random number generator seed, for reproducability
public static final int seed = 12345;
//Number of iterations per minibatch
public static final int iterations = 1;
//Number of epochs (full passes of the data)
public static final int nEpochs = 200;
//Number of data points
public static final int nSamples = 10000;
//Batch size: i.e., each epoch has nSamples/batchSize parameter updates
public static final int batchSize = 100;
//Network learning rate
public static final double learningRate = 0.01;
// The range of the sample data, data in range (0-1 is sensitive for NN, you can try other ranges and see how it effects the results
// also try changing the range along with changing the activation function
public static int MIN_RANGE = 0;
public static int MAX_RANGE = 3;
public static final Random rng = new Random(seed);
public static void main(String[] args){
//Generate the training data
DataSetIterator iterator = getTrainingData(batchSize,rng);
//Create the network
int numInput = 2;
int numOutputs = 1;
int nHidden = 10;
MultiLayerNetwork net = new MultiLayerNetwork(new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(iterations)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.learningRate(learningRate)
.weightInit(WeightInit.XAVIER)
.updater(Updater.NESTEROVS).momentum(0.9)
.list()
.layer(0, new DenseLayer.Builder().nIn(numInput).nOut(nHidden)
.activation("tanh")
.build())
.layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
.activation("identity")
.nIn(nHidden).nOut(numOutputs).build())
.pretrain(false).backprop(true).build()
);
net.init();
net.setListeners(new ScoreIterationListener(1));
//Train the network on the full data set, and evaluate in periodically
for( int i=0; i<nEpochs; i++ ){
iterator.reset();
net.fit(iterator);
}
// Test the max of some numbers (Try different numbers here)
Random rand = new Random();
for (int i= 0; i< 10; i++) {
double d1 = MIN_RANGE + (MAX_RANGE - MIN_RANGE) * rand.nextDouble();
double d2 = MIN_RANGE + (MAX_RANGE - MIN_RANGE) * rand.nextDouble();
INDArray input = Nd4j.create(new double[] { d1, d2 }, new int[] { 1, 2 });
INDArray out = net.output(input, false);
System.out.println("Max(" + d1 + "," + d2 + ") = " + out);
}
}
private static DataSetIterator getTrainingData(int batchSize, Random rand){
double [] max = new double[nSamples];
double [] input1 = new double[nSamples];
double [] input2 = new double[nSamples];
for (int i= 0; i< nSamples; i++) {
input1[i] = MIN_RANGE + (MAX_RANGE - MIN_RANGE) * rand.nextDouble();
input2[i] = MIN_RANGE + (MAX_RANGE - MIN_RANGE) * rand.nextDouble();
max[i] = Math.max(input1[i], input2[i]);
}
INDArray inputNDArray1 = Nd4j.create(input1, new int[]{nSamples,1});
INDArray inputNDArray2 = Nd4j.create(input2, new int[]{nSamples,1});
INDArray inputNDArray = Nd4j.hstack(inputNDArray1,inputNDArray2);
INDArray outPut = Nd4j.create(max, new int[]{nSamples, 1});
DataSet dataSet = new DataSet(inputNDArray, outPut);
List<DataSet> listDs = dataSet.asList();
Collections.shuffle(listDs,rng);
return new ListDataSetIterator(listDs,batchSize);
}
}
答案 1 :(得分:0)
只是要指出:如果您使用relu
而不是tanh
,那么实际上有一个确切的解决方案,我想您是否可以将网络缩小到相同的大小(隐藏1个层(具有3个节点),您总会得到这些权重(节点的模块置换和权重的缩放(第一层由γ缩放,第二层由1 / gamma缩放)):
max(a,b) = ((1, 1, -1)) * relu( ((1,-1), (0,1), (0,-1)) * ((a,b)) )
其中*
是矩阵乘法。
此等式将以下人类可读的版本转换为NN语言:
max(a,b) = relu(a-b) + b = relu(a-b) + relu(b) - relu(-b)
我尚未实际测试它,我的观点是,从理论上来说,网络应该很容易学习该功能。
编辑: 我只是测试了一下,结果如我所料:
[[-1.0714666e+00 -7.9943770e-01 9.0549403e-01]
[ 1.0714666e+00 -7.7552663e-08 2.6146751e-08]]
和
[[ 0.93330014]
[-1.250879 ]
[ 1.1043695 ]]
其中对应的第一和第二层。将第二个权重进行转置并与第一组权重相乘可以得到一个归一化的版本,可以很容易地将其与我的理论结果进行比较:
[[-9.9999988e-01 9.9999988e-01 1.0000000e+00]
[ 9.9999988e-01 9.7009000e-08 2.8875675e-08]]