我有一个非常简单的问题,我认为可能有在线解决方案,但我还找不到。
我有一个(非数学的,非分析性的)函数,它根据一组变量a,b,c,d计算一个值F,它来自一组文件/数据库/在线爬行,我想要找到最大化F的变量a,b,c,d的集合。搜索a,b,c,d的整个空间是不可行的,并且使用差分/导数是不可能的,因为F不是解析的。我真的很感激指出我可以使用哪些软件包/算法,或者只是如何开始。我在网上看到的关于python优化的大部分内容似乎都是关于分析/数学函数(f(x)= x ^ 2 + ...)而不是更多的非分析问题。
例如:
def F(a,b,c,d):
... a lot of computations from databases, etc using
a,b,c,d that are different float values ...
returns output # output is a float
现在,对于a,b,c,d的所有值,其中每个都有可能的值,让我们说[0,0.1,0.2,... 1.0]。这些值是离散的,我的优化不需要极高的精度。
现在,我想找到一组给出最高F的值a,b,c,d。
哦,我对F,a,b,c,d ......没有最大化限制。
答案 0 :(得分:2)
对于非分析函数,您可以使用遗传算法或类似的进化计算来探索参数空间。然后寻找最大值或者#34;丘陵"在最终的空间内找到最大化您的功能的解决方案。我建议使用图书馆,而不是自己写; DEAP看起来很有希望。
答案 1 :(得分:2)
你已经很好地分解了你的问题。正如您所知,这就是空间搜索。
除了Prolog(其实际上是语言本身就是解决方案引擎)之外,我不知道用任何语言执行此操作的库,但是最常用的空间搜索算法之一是“A star”搜索,也是被称为“启发式引导优化”。你的问题看起来像是一个名为“贪婪的最佳搜索”的A-star搜索的好邻居。
您基本上从一组参数开始,在这些参数上调用F
,然后稍微调整每个参数以查看F
如何更改。你走向“上坡”,接受使F
增加最多的调整,并可能将“其他”路径放在后面进行搜索。这贪婪地让你走向“山顶” - 一个局部的最大值。达到局部最大值后,您可以尝试从一些随机的参数组合中再次搜索。您甚至可以使用模拟退火之类的东西来减少随着时间的推移调整参数的量 - 首先搜索非常混乱,然后在您知道模糊的问题地形后安顿下来。
保证最佳结果的唯一方法是进行完整的搜索,例如BFS,但是有很多好的方法可以让你很有可能获得最佳结果。哪一个会给你最快的结果取决于F
:如果输入和输出之间的映射至少关闭到连续几乎不连续的拓扑,我在这里展示的爬山是最好的
答案 2 :(得分:1)
感谢大家的帮助和评论,我能够建立一个答案。 DEAP文档非常有用,但我想分享我的答案以及一些希望对其他人有帮助的评论。
我在这里使用了来自https://github.com/DEAP/deap/blob/b46dde2b74a3876142fdcc40fdf7b5caaa5ea1f4/examples/ga/onemax.py的OneMax示例,其中有一个演练:https://deap.readthedocs.org/en/latest/examples/ga_onemax.html。我发现这两个页面也非常有价值,运营商:https://deap.readthedocs.org/en/latest/tutorials/basic/part2.html#next-step和创建类型:https://deap.readthedocs.org/en/latest/tutorials/basic/part1.html#creating-types
所以我的解决方案在这里(格式化道歉,这是我第一次发布长代码并添加注释)。基本上,你真正需要做的就是让健身评估功能(这里eval_Inidividual
)成为你想要优化的功能F.并且您可以通过在初始化(此处为random_initialization
)和变异(此处为mutate_inputs
)时限制其可能值来限制F的每个N个变量/输入的范围(和分布)。
最后一点:我使用多处理库创建了我的代码多核(只需两行代码并更改你使用的映射函数就可以了!)。在此处阅读更多内容:https://deap.readthedocs.org/en/default/tutorials/distribution.html
代码(阅读我的评论以获得更多解释):
import random
from deap import base
from deap import creator
from deap import tools
start_clock = time.clock()
NGEN = 100 # number of generations to run evolution on
pop_size = 10000 # number of individuals in the population. this is the number of points in the N-dimensional space you start within
CXPB = 0.5 # probability of cross-over (reproduction) to replace individuals in population by their offspring
MUTPB = 0.2 # probability of mutation
mutation_inside = 0.05 # prob mutation within individual
num_cores = 6
N = 8 # the number of variables you are trying to optimize over. you can limit the range (and distribtuion) of each of them by limiting their possible values at initialization and mutation.
def eval_Inidividual(individual):
# this code runs on your individual and outputs the 'fitness' of the individual
def mutate_inputs(individual, indpb):
# this is my own written mutation function that takes an individual and changes each element in the tuple with probability indpb
# there are many great built in such mutation functions
def random_initialization():
# this creates each individual with an N-tuple where N is the number of variables you are optimizing over
creator.create("FitnessMax", base.Fitness, weights=(-1.0,)) # negative if trying to minimize mean, positive if trying to maximize sharpe. can be a tuple if you are trying to maximize/minimize over several outputs at the same time e.g. maximize mean, minimize std for fitness function that returns (mean, std) would need you to use (1.0, -1.0)
creator.create("Individual", list, fitness=creator.FitnessMax)
toolbox = base.Toolbox()
# Attribute generator
toolbox.register("attr_floatzzz", random_initialization) # i call it attr_floatzzz to make sure you know you can call it whatever you want.
# Structure initializers
toolbox.register("individual", tools.initRepeat, creator.Individual,
toolbox.attr_floatzzz, N) # N is the number of variables in your individual e.g [.5,.5,.5,.5,.1,100] that get
# fed to your fitness function evalOneMax
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
import multiprocessing as mp
pool = mp.Pool(processes=num_cores)
toolbox.register("map", pool.map) # these 2 lines allow you to run the computation multicore. You will need to change the map functions everywhere to toolbox.map to tell the algorithm to use a multicored map
# Operator registering
toolbox.register("evaluate", eval_Inidividual)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", mutate_inputs, indpb = mutation_inside)
toolbox.register("select", tools.selTournament, tournsize=3)
def main():
# random.seed(64)
pop = toolbox.population(n=pop_size) # these are the different individuals in this population,
# each is a random combination of the N variables
print("Start of evolution")
# Evaluate the entire population
fitnesses = list(toolbox.map(toolbox.evaluate, pop))
for ind, fit in zip(pop, fitnesses):
ind.fitness.values = fit #this runs the fitness (min mean on each of the individuals)
# print(" Evaluated %i individuals" % len(pop))
# Begin the evolution
for g in range(NGEN):
print("-- Generation %i --" % g)
f.write("-- Generation %i --\n" % g)
# f.write("-- Generation %i --\n" % g)
# g = open('GA_generation.txt','w')
# g.write("-- Generation %i --" % g)
# g.close()
# Select the next generation individuals
offspring = toolbox.select(pop, len(pop)) # this selects the best individuals in the population
# Clone the selected individuals
offspring = list(toolbox.map(toolbox.clone, offspring)) #ensures we don’t own a reference to the individuals but an completely independent instance.
# Apply crossover and mutation on the offspring
for child1, child2 in zip(offspring[::2], offspring[1::2]): #this takes all the odd-indexed and even-indexed pairs child1, child2 and mates them
if random.random() < CXPB:
toolbox.mate(child1, child2)
del child1.fitness.values
del child2.fitness.values
for mutant in offspring:
if random.random() < MUTPB:
toolbox.mutate(mutant)
del mutant.fitness.values
# Evaluate the individuals with an invalid fitness
invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
for ind, fit in zip(invalid_ind, fitnesses):
ind.fitness.values = fit
# print(" Evaluated %i individuals" % len(invalid_ind))
# The population is entirely replaced by the offspring
pop[:] = offspring
# Gather all the fitnesses in one list and print the stats
fits = [ind.fitness.values[0] for ind in pop]
# length = len(pop)
# mean = sum(fits) / length
# sum2 = sum(x*x for x in fits)
# std = abs(sum2 / length - mean**2)**0.5
print(" Min %s" % min(fits))
# print(" Max %s" % max(fits))
# print(" Avg %s" % mean)
# print(" Std %s" % std)
print("-- End of (successful) evolution --")
best_ind = tools.selBest(pop, 1)[0]
print("Best individual is %s with mean %s" % (best_ind,
best_ind.fitness.values[0])
done = time.clock() - start_clock # the clock doens't work on the multicored version. I have no idea how to make it work :)
print "time taken: ", done, 'seconds'
if __name__ == "__main__":
main()
p.s。:时钟在多重版本上工作。我不知道如何使它工作:)