高斯过程回归的各向异性内核如何与可变数量的特征一起使用?

时间:2019-02-12 10:50:32

标签: python scikit-learn

我的用例是,我想为高斯过程回归自动选择特征。对于各向同性内核,可以轻松完成此操作,如以下示例所示:

import numpy as np
from mlxtend.feature_selection import SequentialFeatureSelector
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF

X = np.random.rand(100, 10)
y = np.random.rand(100)

gpr = GaussianProcessRegressor(kernel=RBF(length_scale=[1]))
selector = SequentialFeatureSelector(gpr, forward=False)
selector.fit(X, y)

要使用各向异性内核,必须将内核的定义更改为RBF(length_scale=[1] * num_features)

但是,特征数量在每一轮特征选择中都会改变,从而提高ValueError: Anisotropic kernel must have the same number of dimensions as data (10!=9)

有没有办法获得具有动态数量特征的各向异性内核?

1 个答案:

答案 0 :(得分:0)

作为一个肮脏的黑客,我将GaussianProcessRegressor子类化,并向fit添加了一个函数,该函数递归扫描所有内核,并将所有可以各向异性的内核(目前只有RBF和Matern)替换为length_scale参数,向量。

class GaussianProcessRegressorAnisotropic(GaussianProcessRegressor):
    def fit(self, X, y):
        self._fix_kernel_length_scales(self.kernel, X.shape[1])
        super().fit(X, y)

    def _fix_kernel_length_scales(self, kernel, num_features):
        if isinstance(kernel, RBF) or isinstance(kernel, Matern):
            kernel.length_scale = [kernel.length_scale] * num_features
        elif isinstance(kernel, Product) or isinstance(kernel, Sum):
            self._fix_kernel_length_scales(kernel.k1, num_features)
            self._fix_kernel_length_scales(kernel.k2, num_features)
        elif isinstance(kernel, Exponentiation):
            self._fix_kernel_length_scales(kernel.kernel, num_features)
        elif isinstance(kernel, CompoundKernel):
            for sub_kernel in kernel.kernels:
                self._fix_kernel_length_scales(sub_kernel, num_features)

但是,也许有人有更好的解决方案?