在SVM中使用带有卡方距离度量的RBF内核

时间:2017-02-26 13:46:48

标签: machine-learning scikit-learn classification svm

如何实现标题提到的任务。我们在RBF内核中是否有任何参数将距离度量设置为卡方距离度量。我可以在sk-learn库中看到一个chi2_kernel。

以下是我写的代码。

import numpy as np
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

from sklearn.preprocessing import Imputer
from numpy import genfromtxt
from sklearn.metrics.pairwise import chi2_kernel


file_csv = 'dermatology.data.csv'
dataset = genfromtxt(file_csv, delimiter=',')

imp = Imputer(missing_values='NaN', strategy='most_frequent', axis=1)
dataset = imp.fit_transform(dataset)

target = dataset[:, [34]].flatten()
data = dataset[:, range(0,34)]

X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3)

# TODO : willing to set chi-squared distance metric instead. How to do that ?
clf = svm.SVC(kernel='rbf', C=1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

print(f1_score(y_test, y_pred, average="macro"))
print(precision_score(y_test, y_pred, average="macro"))
print(recall_score(y_test, y_pred, average="macro"))

1 个答案:

答案 0 :(得分:0)

您确定要撰写 rbf和chi2吗? Chi2本身定义了一个有效的内核,你所要做的就是

clf = svm.SVC(kernel=chi2_kernel, C=1)

因为sklearn接受函数作为内核(但这需要O(N ^ 2)内存和时间)。如果你想组合这两个,那就有点复杂了,你必须实现自己的内核才能做到这一点。对于更多控件(和其他内核),您也可以尝试pykernels,但是还没有支持编写。