进化算法没有改进

时间:2017-05-28 14:09:20

标签: python neural-network evolutionary-algorithm fitness

周末我尝试建立一个神经网络,它改进了使用进化算法。我在openai(https://www.openai.com/)的Cartpole环境中运行了5000代,但它并没有很好地改进。神经网络有4个输入,1个隐藏层,3个单元,1个输出,网络使用tanH作为激活功能。每一代都有100个人,其中5个被选中组成下一代,有20%的可能性发生变异。以下是更好理解的准则:

import operator
import gym
import math
import random
import numpy
import matplotlib.pyplot as plt

env = gym.make('CartPole-v0')

generations = 100
input_units = 4
Hidden_units = 3
output_units = 1
individuals = 100

fitest1 = []
fitest2 = []

def Neural_Network(x, weights1, weights2):
    global output
    output = list(map(operator.mul, x, weights1))
    output = numpy.tanh(output)
    output = list(map(operator.mul, output, weights2))
    output = sum(output)
    return(output)

weights1 = [[random.random() for i in range(input_units*Hidden_units)] for j in range(individuals)]
weights2 = [[random.random() for i in range(Hidden_units*output_units)] for j in range(individuals)]

fit_plot = []

for g in range(generations):
    print('generation:',g+1)
    fitness=[0 for f in range(individuals)]
    prev_obs = []
    observation = env.reset()
    for w in weights1:
        print('        individual ',weights1.index(w)+1, ' of ', len(weights1))
        env.reset()
        for t in range(500):
            #env.render()
            Neural_Network(observation, weights1[weights1.index(w)], weights2[weights1.index(w)])
            action = output < 0.5
            observation, reward, done, info = env.step(action)
            fitness[weights1.index(w)]+=reward
            if done:
                break
        print('        individual fitness:', fitness[weights1.index(w)])
    print('min fitness:', min(fitness))
    print('max fitness:', max(fitness))
    print('average fitness:', sum(fitness)/len(fitness))
    fit_plot.append(sum(fitness)/len(fitness))
    for f in range(10):
        fitest1.append(weights1[fitness.index(max(fitness))])
        fitest2.append(weights2[fitness.index(max(fitness))])
        fitness[fitness.index(max(fitness))] = -1000000000


    for x in range(len(weights1)):
        for y in range(len(weights1[x])):
            weights1[x][y]=random.choice(fitest1)[y]
            if random.randint(1,5) == 1:
                weights1[random.randint(0, len(weights1)-1)][random.randint(0, len(weights1[0])-1)] += random.choice([0.1, -0.1])

    for x in range(len(weights2)):
        for y in range(len(weights2[x])):
            weights2[x][y]=random.choice(fitest2)[y]
            if random.randint(1,5) == 1:
                weights1[random.randint(0, len(weights1)-1)][random.randint(0, len(weights1[0])-1)] += random.choice([0.1, -0.1])

plt.axis([0,generations,0,100])
plt.ylabel('fitness')
plt.xlabel('generations')
plt.plot(range(0,generations), fit_plot)
plt.show()

env.reset()
for t in range(100):
    env.render()
    Neural_Network(observation, fitest1[0], fitest2[0])
    action = output < 0.5
    observation, reward, done, info = env.step(action)
    if done:
        break

如果有人想知道,几代人的平均健康状况图表(这次我只运行了100代)As you can see, the algorithm is not improving

如果还有任何问题,请询问。

2 个答案:

答案 0 :(得分:0)

我的观点是,在进化算法中,你没有在EA结束时选择正确的个体。确保您选择最好的2个人(可以只使用一个,但我们想要比这更好:))新一代。这应该会改善预期的结果:)

答案 1 :(得分:0)

发生突变的可能性似乎很高,为20%。尝试将其降低到1-5%,到目前为止,根据我的实验通常可以得到更好的结果。