尝试在sklearn KernelRidge中使用自定义内核时遇到问题。 实际上,我正在研究序列数据(蛋白质),这意味着所有数据的长度都不相同。为了解决这个问题,我创建了自己的内核,该内核接受两个序列(我需要其他特征)并输出实数。这两个输入采用字典的形式。
def mykernel(sample1, sample2):
feature_map = PolynomialFeatures(2,interaction_only=True)
taille1 = len(sample1['prot'])
taille2 = len(sample1['prot'])
out1=np.zeros((taille1,46))
out2=np.zeros((taille2,46))
for pos in range(taille1):
...
for pos in range(taille2):
...
outer = np.outer(out1, out2)
return np.sum(outer)/(taille1*taille2)
然后我将数据集定义为该dict的数据集,将其重塑以与sklearn一起使用。
train_data = [sample1, sample2, sample3, sample4, ...]
train_data = np.array(train_data).reshape(1,-1)
然后我只想用我的自定义内核调用我的Kernel Ridge回归器,但是即使我的自定义内核可以使用我的数据,它似乎也不喜欢我的数据是事实。
mk = mykernel
clf = KernelRidge(alpha=1.0, kernel = mk)
clf.fit(train_list, Y_train)
我得到以下内容
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-52-eeeaef9312db> in <module>
1 clf = KernelRidge(alpha=1.0, kernel = mk)
----> 2 clf.fit(train_list, Y_train)
3
4 #print(clf.score(train_dataset, Y_train),clf.score(test_dataset, Y_test))
~\.conda\envs\myenv\lib\site-packages\sklearn\kernel_ridge.py in fit(self, X, y, sample_weight)
150 # Convert data
151 X, y = check_X_y(X, y, accept_sparse=("csr", "csc"), multi_output=True,
--> 152 y_numeric=True)
153 if sample_weight is not None and not isinstance(sample_weight, float):
154 sample_weight = check_array(sample_weight, ensure_2d=False)
~\.conda\envs\myenv\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
753 ensure_min_features=ensure_min_features,
754 warn_on_dtype=warn_on_dtype,
--> 755 estimator=estimator)
756 if multi_output:
757 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
~\.conda\envs\myenv\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
529 array = array.astype(dtype, casting="unsafe", copy=False)
530 else:
--> 531 array = np.asarray(array, order=order, dtype=dtype)
532 except ComplexWarning:
533 raise ValueError("Complex data not supported\n"
~\.conda\envs\myenv\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
TypeError: float() argument must be a string or a number, not 'dict'
我希望对此有解决方案。预先谢谢你,
巴特