Question

我正在为我的主项目说明hyperopt的TPE算法，但似乎无法使该算法收敛。根据我对原始paper和youtube lecture的了解，TPE算法的工作步骤如下：

（以下，x =超参数，y =损耗）

首先创建[x，y]的搜索记录，例如10点。
根据超参数的损失对它们进行排序，并使用一些分位数γ将它们分为两组（γ= 0.5意味着这些组将具有相等的大小）
对不良超参数组（g（x））和良好超参数组（l（x））进行核密度估计
好的估计将在g（x）中具有低概率，而在l（x）中具有高概率，因此我们建议在argmin（g（x）/ l（x））处评估函数
在建议的点评估（x，y）对，并重复步骤2-5。

我已经在python的目标函数f（x）= x ^ 2上实现了该算法，但是算法未能收敛到最小值。

import numpy as np
import scipy as sp
from matplotlib import pyplot as plt
from scipy.stats import gaussian_kde


def objective_func(x):
    return x**2

def measure(x):
    noise = np.random.randn(len(x))*0
    return x**2+noise

def split_meassures(x_obs,y_obs,gamma=1/2):
    #split x and y observations into two sets and return a seperation threshold (y_star)
    size = int(len(x_obs)//(1/gamma))
    l = {'x':x_obs[:size],'y':y_obs[:size]}
    g = {'x':x_obs[size:],'y':y_obs[size:]}
    y_star = (l['y'][-1]+g['y'][0])/2
    return l,g,y_star

#sample objective function values for ilustration
x_obj = np.linspace(-5,5,10000)
y_obj = objective_func(x_obj)

#start by sampling a parameter search history
x_obs = np.linspace(-5,5,10)
y_obs = measure(x_obs)

nr_iterations = 100
for i in range(nr_iterations):

    #sort observations according to loss
    sort_idx = y_obs.argsort()
    x_obs,y_obs = x_obs[sort_idx],y_obs[sort_idx]

    #split sorted observations in two groups (l and g)
    l,g,y_star = split_meassures(x_obs,y_obs)

    #aproximate distributions for both groups using kernel density estimation
    kde_l = gaussian_kde(l['x']).evaluate(x_obj)
    kde_g = gaussian_kde(g['x']).evaluate(x_obj)

    #define our evaluation measure for sampling a new point
    eval_measure = kde_g/kde_l

    if i%10==0:
        plt.figure()
        plt.subplot(2,2,1)
        plt.plot(x_obj,y_obj,label='Objective')
        plt.plot(x_obs,y_obs,'*',label='Observations')
        plt.plot([-5,5],[y_star,y_star],'k')
        plt.subplot(2,2,2)
        plt.plot(x_obj,kde_l)
        plt.subplot(2,2,3)
        plt.plot(x_obj,kde_g)
        plt.subplot(2,2,4)
        plt.semilogy(x_obj,eval_measure)
        plt.draw()

    #find point to evaluate and add the new observation
    best_search = x_obj[np.argmin(eval_measure)]
    x_obs = np.append(x_obs,[best_search])
    y_obs = np.append(y_obs,[measure(np.asarray([best_search]))])

plt.show()

我怀疑这是因为我们在最确定的地方继续采样，从而使l（x）在这一点附近越来越窄，而在我们采样的地方根本没有改变。那我缺乏什么了解？

Answer 1

因此，我仍在学习TPE。但是，这是此代码中的两个问题：

此代码将仅评估一些唯一点。因为最佳位置是根据内核密度函数推荐的最佳位置来计算的，但是代码无法探索搜索空间。例如，采集功能做什么。
因为此代码只是将新的观察结果追加到x和y列表中。它添加了大量重复项。重复项导致一组倾斜的观察值，并导致非常奇怪的分裂，您可以在以后的图中轻松看到。 eval_measure开始时与目标函数类似，但后来又有所分歧。

如果您删除x_obs和y_obs中的重复项，则可以删除问题号。 2.但是，第一个问题只能通过添加某种探索搜索空间的方式来消除。

熟悉hyperopt的TPE算法

1 个答案: