我的用例是,我想为高斯过程回归自动选择特征。对于各向同性内核,可以轻松完成此操作,如以下示例所示:
import numpy as np
from mlxtend.feature_selection import SequentialFeatureSelector
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
X = np.random.rand(100, 10)
y = np.random.rand(100)
gpr = GaussianProcessRegressor(kernel=RBF(length_scale=[1]))
selector = SequentialFeatureSelector(gpr, forward=False)
selector.fit(X, y)
要使用各向异性内核,必须将内核的定义更改为RBF(length_scale=[1] * num_features)
。
但是,特征数量在每一轮特征选择中都会改变,从而提高ValueError: Anisotropic kernel must have the same number of dimensions as data (10!=9)
有没有办法获得具有动态数量特征的各向异性内核?
答案 0 :(得分:0)
作为一个肮脏的黑客,我将GaussianProcessRegressor
子类化,并向fit
添加了一个函数,该函数递归扫描所有内核,并将所有可以各向异性的内核(目前只有RBF和Matern)替换为length_scale参数,向量。
class GaussianProcessRegressorAnisotropic(GaussianProcessRegressor):
def fit(self, X, y):
self._fix_kernel_length_scales(self.kernel, X.shape[1])
super().fit(X, y)
def _fix_kernel_length_scales(self, kernel, num_features):
if isinstance(kernel, RBF) or isinstance(kernel, Matern):
kernel.length_scale = [kernel.length_scale] * num_features
elif isinstance(kernel, Product) or isinstance(kernel, Sum):
self._fix_kernel_length_scales(kernel.k1, num_features)
self._fix_kernel_length_scales(kernel.k2, num_features)
elif isinstance(kernel, Exponentiation):
self._fix_kernel_length_scales(kernel.kernel, num_features)
elif isinstance(kernel, CompoundKernel):
for sub_kernel in kernel.kernels:
self._fix_kernel_length_scales(sub_kernel, num_features)
但是,也许有人有更好的解决方案?