我正在用Naive Bayes
来实现sklearn
,数据不平衡。
我的数据有超过16000条记录和6个输出类别。
我尝试用sample_weight
计算的sklearn.utils.class_weight
拟合模型
sample_weight
收到以下信息:
sample_weight = [11.77540107 1.82284768 0.64688602 2.47138047 0.38577435 1.21389195]
import numpy as np
data_set = np.loadtxt("./data/_vector21.csv", delimiter=",")
inp_vec = data_set[:, 1:22]
out_vec = data_set[:, 22:]
#
# # Split dataset into training set and test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(inp_vec, out_vec, test_size=0.2) # 80% training and 20% test
#
# class weight
from keras.utils.np_utils import to_categorical
output_vec_categorical = to_categorical(y_train)
from sklearn.utils import class_weight
y_ints = [y.argmax() for y in output_vec_categorical]
c_w = class_weight.compute_class_weight('balanced', np.unique(y_ints), y_ints)
cw = {}
for i in set(y_ints):
cw[i] = c_w[i]
# Create a Gaussian Classifier
from sklearn.naive_bayes import *
model = GaussianNB()
# Train the model using the training sets
print(c_w)
model.fit(X_train, y_train, c_w)
# Predict the response for test dataset
y_pred = model.predict(X_test)
# Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("\nClassification Report: \n", (metrics.classification_report(y_test, y_pred)))
print("\nAccuracy: %.3f%%" % (metrics.accuracy_score(y_test, y_pred)*100))
我收到此消息:
ValueError: Found input variables with inconsistent numbers of samples: [13212, 6]
谁能告诉我我做错了什么以及如何解决?
非常感谢。
答案 0 :(得分:3)
sample_weight
和class_weight
是两回事。
顾名思义,
sample_weight
将应用于单个样本(数据行)。因此,sample_weight
的长度必须与X
中的样本数相匹配。
class_weight
是为了使分类器对类给予更多的重视和关注。因此,class_weight
的长度必须与目标中的类数相匹配。
您正在使用class_weight
计算sample_weight
而不是sklearn.utils.class_weight
,但是尝试将其传递给sample_weight
。因此,尺寸不匹配错误。
请参见以下问题,以进一步了解这两个权重在内部如何相互作用:
答案 1 :(得分:0)
通过这种方式,我能够计算权重以解决类别不平衡问题。
from sklearn.utils import class_weight
sample = class_weight.compute_sample_weight('balanced', y_train)
#Classifier Naive Bayes
naive = naive_bayes.MultinomialNB()
naive.fit(X_train,y_train, sample_weight=sample)
predictions_NB = naive.predict(X_test)