sklearn内核岭回归:用于复杂数据的自定义内核

时间:2020-06-29 11:22:00

标签: python scikit-learn non-linear-regression

尝试在sklearn KernelRidge中使用自定义内核时遇到问题。 实际上,我正在研究序列数据(蛋白质),这意味着所有数据的长度都不相同。为了解决这个问题,我创建了自己的内核,该内核接受两个序列(我需要其他特征)并输出实数。这两个输入采用字典的形式。

def mykernel(sample1, sample2):
    feature_map = PolynomialFeatures(2,interaction_only=True)
    taille1 = len(sample1['prot'])
    taille2 = len(sample1['prot'])

    out1=np.zeros((taille1,46))
    out2=np.zeros((taille2,46))
    
    for pos in range(taille1):
        ...
       
    for pos in range(taille2):
        ...
    
    outer = np.outer(out1, out2)
    return np.sum(outer)/(taille1*taille2)

然后我将数据集定义为该dict的数据集,将其重塑以与sklearn一起使用。

train_data = [sample1, sample2, sample3, sample4, ...]
train_data = np.array(train_data).reshape(1,-1)

然后我只想用我的自定义内核调用我的Kernel Ridge回归器,但是即使我的自定义内核可以使用我的数据,它似乎也不喜欢我的数据是事实。

   mk = mykernel
   clf = KernelRidge(alpha=1.0, kernel = mk)
   clf.fit(train_list, Y_train)

我得到以下内容

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-52-eeeaef9312db> in <module>
      1 clf = KernelRidge(alpha=1.0, kernel = mk)
----> 2 clf.fit(train_list, Y_train)
      3 
      4 #print(clf.score(train_dataset, Y_train),clf.score(test_dataset, Y_test))

~\.conda\envs\myenv\lib\site-packages\sklearn\kernel_ridge.py in fit(self, X, y, sample_weight)
    150         # Convert data
    151         X, y = check_X_y(X, y, accept_sparse=("csr", "csc"), multi_output=True,
--> 152                          y_numeric=True)
    153         if sample_weight is not None and not isinstance(sample_weight, float):
    154             sample_weight = check_array(sample_weight, ensure_2d=False)

~\.conda\envs\myenv\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    753                     ensure_min_features=ensure_min_features,
    754                     warn_on_dtype=warn_on_dtype,
--> 755                     estimator=estimator)
    756     if multi_output:
    757         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

~\.conda\envs\myenv\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    529                     array = array.astype(dtype, casting="unsafe", copy=False)
    530                 else:
--> 531                     array = np.asarray(array, order=order, dtype=dtype)
    532             except ComplexWarning:
    533                 raise ValueError("Complex data not supported\n"

~\.conda\envs\myenv\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

TypeError: float() argument must be a string or a number, not 'dict'

我希望对此有解决方案。预先谢谢你,

巴特

0 个答案:

没有答案