Scikit-learn - KNeighborsClassifier的用户定义权重函数

时间:2013-06-26 18:35:49

标签: machine-learning scikit-learn nearest-neighbor

我有一个KNeighborsClassifier,它根据4个属性对数据进行分类。我想手动加权这4个属性,但总是遇到“操作数无法与形状一起广播(1,5)(4)”。

关于weights : [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.(来自here

的文档非常少

这就是我现在所拥有的:

    for v in result:
        params = [v['a_one'], v['a_two'], v['a_three'], v['a_four']]
        self.training_data['data'].append(params)
        self.training_data['target'].append(v['answer'])

    def get_weights(array_weights):
        return [1,1,2,1]

    classifier = neighbors.KNeighborsClassifier(weights=get_weights)

2 个答案:

答案 0 :(得分:2)

可调用的sklearn权重的说明

import numpy as np
from sklearn.neighbors import KNeighborsClassifier

为模型培训创建样本数据

df = pd.DataFrame({'feature1':[1,3,3,4,5], 'response':[1,1,1,2,2]})

y = df.response
# [1,1,1,2,2]

X_train = df[['feature1']]
# [1,3,3,4,5]

定义自定义距离函数(打印输入数据结构)

def my_distance(weights):
    print(weights)
    return weights

将my_distance中传递的模型定义为可调用的权重

knn = KNeighborsClassifier(n_neighbors=3, weights=my_distance)

knn.fit(X_train,y)

knn.predict([[1]])
# [[ 0.  2.  2.]]
# array([1])

说明:将3个最近邻居(n_neighbors = 3)显示为预测值1

X_train中三个最接近的邻居:

1, 3, 3 

距离:

[[ 0.  2.  2.]]

1 - 1 = 0 
3 - 1 = 2
3 - 1 = 2

预测班级:

array([1])

答案 1 :(得分:0)

对于高斯 ## gamma是此处的超参数-我们需要选择最合适的参数。

def gaussian_kernel(distance):
     weights = np.exp(-gamma*(distance**2))
     return weights/np.sum(weights)