我是使用二元相关性进行多标签分类的新手,并且有一些问题可以解释结果:
结果是: [[0. 0.] [2. 2。]]
这是否意味着第一个案例被分类[0,0]而第二个案例被[2,2]?这看起来并不好看。还是我错过了别的什么?
在gentelmen现在回答后,我收到以下错误 由于y_train标签[2 **,0,** 3,4]因为零
Traceback (most recent call last):
File "driver.py", line 22, in <module>
clf_dict[i] = clf.fit(x_train, y_tmp)
File "C:\Users\BaderEX\Anaconda22\lib\site-packages\sklearn\linear_model\logistic.py", line 1154, in fit
self.max_iter, self.tol, self.random_state)
File "C:\Users\BaderEX\Anaconda22\lib\site-packages\sklearn\svm\base.py", line 885, in _fit_liblinear
" class: %r" % classes_[0])
ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1
更新的代码:
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import *
numer_classes = 5
x_train = np.array([[1,2,3,4],[0,1,2,1],[1,2,0,3]])
y_train = [[0],[1,0,3],[2,0,3,4]]
x_test = np.array([[1,2,3,4],[0,1,2,1],[1,2,0,3]])
y_test = [[0],[1,0,3],[2,0,3,4]]
clf_dict = {}
for i in range(numer_classes):
y_tmp = []
for j in range(len(y_train)):
if i in y_train[j]:
y_tmp.append(1)
else:
y_tmp.append(0)
clf = LogisticRegression()
clf_dict[i] = clf.fit(x_train, y_tmp)
prediction_matrix = np.zeros((len(x_test),numer_classes))
for i in range(numer_classes):
prediction = clf_dict[i].predict(x_test)
prediction_matrix[:,i] = prediction
print('Predicted')
print(prediction_matrix)
由于
答案 0 :(得分:3)
对于二元相关性,您应该为每个标签制作指标类:0或1。 http://trzebinski.info/noip-daemon-autostart-after-system-restart-on-raspberry-pi-raspbian/提供了与分类器兼容的scikit。
集:
def to_indicator_matrix(y_list):
y_train_matrix = np.zeros(shape=(len(y_list), max(map(len, y_list))+1), dtype='i8')
for i, y in enumerate(y_list):
y_train_matrix[i][y] = 1
return y_train_matrix
鉴于你的y_train和y_test,运行:
y_train = to_indicator_matrix(y_train)
y_test = to_indicator_matrix(y_test)
你的y_train现在是:
array([[1, 1, 0],
[0, 1, 1],
[1, 0, 1]])
这应该可以解决您的问题。使用scikit-multilearn BinaryRelevance然后使用您自己的代码更为舒适。试试吧!
运行
pip install scikit-multilearn
然后尝试
from skmultilearn.problem_transform import BinaryRelevance
from sklearn.linear_model import LogisticRegression
import sklearn.metrics
# assume data is loaded using
# and is available in X_train/X_test, y_train/y_test
# initialize Binary Relevance multi-label classifier
# with gaussian naive bayes base classifier
classifier = BinaryRelevance(LogisticRegression(C=40,class_weight='balanced'), require_dense)
# train
classifier.fit(X_train, y_train)
# predict
predictions = classifier.predict(X_test)
# measure
print(sklearn.metrics.hamming_loss(y_test, predictions))
答案 1 :(得分:2)
我认为你在实施中犯了一个错误。对于二进制相关性,我们需要为每个标签分别设置一个分类器。有三个标签,因此应该有3个分类器。每个分类器都会告诉实例属于某个类的天气。例如,分类器对应于类1(clf [1])将只告诉实例属于类1的天气。
因此,如果要手动实现二进制相关性,在创建分类器的循环中,标签应该二进制化:
for i in range(numer_classes):
y_tmp = []
for j in range(len(y_train)):
if i in y_train[j]:
y_tmp.append(1)
else:
y_tmp.append(0)
clf = LogisticRegression()
clf_dict[i] = clf.fit(x_train, y_tmp)
但是,如果你使用sklearn,事情会更方便:
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import MultiLabelBinarizer
binarizer = MultiLabelBinarizer()
y_train_binarized = binarizer.fit_transform(y_train)
y_test_binarized = binarizer.fit_transform(y_test)
cls = OneVsRestClassifier(estimator=LogisticRegression())
cls.fit(x_train,y_train_binarized)
y_predict = cls.predict(x_test)
结果如下: [[1 0 1] [0 1 1]] 这意味着第一种情况预测为:[0,2],第二种情况预测为[1,2]