我有一个KNeighborsClassifier,它根据4个属性对数据进行分类。我想手动加权这4个属性,但总是遇到“操作数无法与形状一起广播(1,5)(4)”。
关于weights : [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
(来自here)
这就是我现在所拥有的:
for v in result:
params = [v['a_one'], v['a_two'], v['a_three'], v['a_four']]
self.training_data['data'].append(params)
self.training_data['target'].append(v['answer'])
def get_weights(array_weights):
return [1,1,2,1]
classifier = neighbors.KNeighborsClassifier(weights=get_weights)
答案 0 :(得分:2)
可调用的sklearn权重的说明
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
为模型培训创建样本数据
df = pd.DataFrame({'feature1':[1,3,3,4,5], 'response':[1,1,1,2,2]})
y = df.response
# [1,1,1,2,2]
X_train = df[['feature1']]
# [1,3,3,4,5]
定义自定义距离函数(打印输入数据结构)
def my_distance(weights):
print(weights)
return weights
将my_distance中传递的模型定义为可调用的权重
knn = KNeighborsClassifier(n_neighbors=3, weights=my_distance)
knn.fit(X_train,y)
knn.predict([[1]])
# [[ 0. 2. 2.]]
# array([1])
说明:将3个最近邻居(n_neighbors = 3)显示为预测值1
X_train中三个最接近的邻居:
1, 3, 3
距离:
[[ 0. 2. 2.]]
1 - 1 = 0
3 - 1 = 2
3 - 1 = 2
预测班级:
array([1])
答案 1 :(得分:0)
对于高斯 ## gamma是此处的超参数-我们需要选择最合适的参数。
def gaussian_kernel(distance):
weights = np.exp(-gamma*(distance**2))
return weights/np.sum(weights)