Encog Lunar Lander扩展

时间:2016-02-11 07:50:29

标签: machine-learning neural-network artificial-intelligence encog simulated-annealing

这个问题是参考Encog存储库中获得的C#'s Lunar Lander Example。如示例所示,我使用NeuralSimulatedAnnealing来训练我的多层前馈网络(50个时代)

BasicNetwork network = CreateNetwork();

IMLTrain train;
train = new NeuralSimulatedAnnealing(network, new PilotScore(), 10, 2, 100);

_

public static BasicNetwork CreateNetwork() {
    var pattern = new FeedForwardPattern {InputNeurons = 3};
    pattern.AddHiddenLayer(50);
    pattern.OutputNeurons = 1;
    pattern.ActivationFunction = new ActivationTANH();
    var network = (BasicNetwork) pattern.Generate();
    network.Reset();
    return network;
}

这个例子很有效,神经飞行员确切地学会了如何在特定条件下着陆飞船,但是我想要更多的东西!

为此我创建了一个类全局变量,如下所示,并修改了LanderSimulator类中的一行

namespace Encog.Examples.Lunar
{
    class globals
    {
        public static int fuelConsumption { get; set; }
    }
}

_

 public void Turn(bool thrust){
    Seconds++;
    Velocity -= Gravity;
    Altitude += Velocity;

    if (thrust && Fuel > 0)
    {
        Fuel-= globals.fuelConsumption;    //changed instead of Fuel--;
        Velocity += Thrust;
    }

    Velocity = Math.Max(-TerminalVelocity, Velocity);
    Velocity = Math.Min(TerminalVelocity, Velocity);

    if (Altitude < 0)
        Altitude = 0;
}

因此,现在取决于 fuelConsumption 变量,每次推力消耗燃料。然后我尝试了三种不同的fuelConsumption值,以下是各个网络的最佳得分:

//NETWORK 1
globals.fuelConsumption = 1;
bestScore: 7986

//NETWORK 2
globals.fuelConsumption = 5;
bestScore: 7422

//NETWORK 3
globals.fuelConsumption = 10;
bestScore: 6921

当我相互测试这些网络时,结果令人失望:

    当liConsumed为5时,
  • 网络1显示得分为-39591和-39661 和10分别。
  • 网络2在燃料消耗时显示-8832和-35671的得分 分别为1和10。
  • 网络3显示得分为-24510和-19697,当燃料消耗分别为1和5时。

所以我尝试为以下所有三种情况训练一个网络:

int epoch;

epoch = 1;
globals.fuelConsumption = 1;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    epoch++;
}
Console.WriteLine("--------------------------------------");

epoch = 1;
globals.fuelConsumption = 5;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    epoch++;
}
Console.WriteLine("--------------------------------------");
epoch = 1;
globals.fuelConsumption = 10;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    epoch++;
}

Console.WriteLine(@"The score of experienced pilot is:");
network = (BasicNetwork) train.Method;

var pilot = new NeuralPilot(network, false);
globals.fuelConsumption = 1;
Console.WriteLine("@1: " + pilot.ScorePilot());
globals.fuelConsumption = 5;
Console.WriteLine("@5: " + pilot.ScorePilot());
globals.fuelConsumption = 10;
Console.WriteLine("@10: " + pilot.ScorePilot());

但结果又一样

The score of experienced pilot is:
@1: -27485
@5: -27565
@10: 7448

如何创建一个能够在所有三种情况下获得最佳分数的神经导航?

1 个答案:

答案 0 :(得分:0)

为了解决这个难题,我改用 NEAT网络,而不是使用传统的前馈或循环网络。以下是代码中的一些有趣变化..

NEATPopulation network = CreateNetwork();
TrainEA train = default(TrainEA);

_

public static NEATPopulation CreateNetwork(){
    int inputNeurons = 3;
    int outputNeurons = 1;
    NEATPopulation network = new NEATPopulation(inputNeurons, outputNeurons, 100);
    network.Reset();
    return network;
}

然后在 NeuralPilot类中调整一些参数后,

private readonly NEATNetwork _network;

public NeuralPilot(NEATNetwork network, bool track)

我必须在 ScorePilot功能中进行更改,因为NEATNetoworks默认情况下使用SteepenedSigmoidActivation而不是输出上的传统ActivationLinear或ActivatonTanH

bool thrust;

if (value > 0.5){       //changed from, if (value > 0){
    thrust = true;
    if (_track)
        Console.WriteLine(@"THRUST");
}
else
    thrust = false;

所以现在培训单个网络如下所示:

OriginalNEATSpeciation speciation = default(OriginalNEATSpeciation);
speciation = new OriginalNEATSpeciation();

int epoch;
double best_1, best_5, best_10;
best_1 = best_5 = best_10 = 0;

train = NEATUtil.ConstructNEATTrainer(network, new PilotScore());
train.Speciation = speciation;

epoch = 1;
globals.fuelConsumption = 1;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    best_1 = train.Error;
    epoch++;
}
Console.WriteLine("--------------------------------------");

train = NEATUtil.ConstructNEATTrainer(network, new PilotScore());
train.Speciation = speciation;

epoch = 1;
globals.fuelConsumption = 5;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    best_5 = train.Error;
    epoch++;
}
Console.WriteLine("--------------------------------------");

train = NEATUtil.ConstructNEATTrainer(network, new PilotScore());
train.Speciation = speciation;

epoch = 1;
globals.fuelConsumption = 10;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    best_10 = train.Error;
    epoch++;
}

Console.WriteLine(@"The score of experienced pilot is:");

NEATNetwork trainedNetwork = default(NEATNetwork);
trainedNetwork = (NEATNetwork)train.CODEC.Decode(network.BestGenome);

var pilot = new NeuralPilot(trainedNetwork, false);
globals.fuelConsumption = 1;
Console.WriteLine("@bestScore of " + best_1.ToString() +" @1: liveScore is " + pilot.ScorePilot());
globals.fuelConsumption = 5;
Console.WriteLine("@bestScore of " + best_5.ToString() + " @5: liveScore is " + pilot.ScorePilot());
globals.fuelConsumption = 10;
Console.WriteLine("@bestScore of " + best_10.ToString() + " @10: liveScore is " + pilot.ScorePilot());

结果有点冒险!以下是随机测试的一些结果:

The score of experienced pilot is:
@bestScore of 5540 @1: liveScore is -4954
@bestScore of 1160 @5: liveScore is 3823
@bestScore of 3196 @10: liveScore is 3196

The score of experienced pilot is:
@bestScore of 7455 @1: liveScore is 8227
@bestScore of 6324 @5: liveScore is 7427
@bestScore of 6427 @10: liveScore is 6427

The score of experienced pilot is:
@bestScore of 5322 @1: liveScore is -4617
@bestScore of 1898 @5: liveScore is 9531
@bestScore of 2086 @10: liveScore is 2086

The score of experienced pilot is:
@bestScore of 7493 @1: liveScore is -3848
@bestScore of 4907 @5: liveScore is -13840
@bestScore of 4954 @10: liveScore is 4954

The score of experienced pilot is:
@bestScore of 6560 @1: liveScore is 4046
@bestScore of 5775 @5: liveScore is 3366
@bestScore of 2516 @10: liveScore is 2516

正如你所看到的那样我们确实设法在第二种情景中获得了积极的分数但是,最终的网络性能与最初的最佳分数之间似乎没有任何关系值。 因此,问题可能会得到解决,但不能令人满意。