我是python和机器学习的新手。根据我的要求,我正在尝试对数据集使用朴素贝叶斯算法。
我能够找出准确度,但尝试找出准确度并回想一下。但是,它引发了以下错误:
"choose another average setting." % y_type)
ValueError: Target is multiclass but average='binary'. Please choose another average setting.
任何人都可以建议我如何进行此操作。我尝试在平均值和召回率分数中使用average ='micro'。它可以正常工作,没有任何错误,但是在准确性,准确性和召回率方面得分相同。
review,label
Colors & clarity is superb,positive
Sadly the picture is not nearly as clear or bright as my 40 inch Samsung,negative
review,label
The picture is clear and beautiful,positive
Picture is not clear,negative
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
def load_data(filename):
reviews = list()
labels = list()
with open(filename) as file:
file.readline()
for line in file:
line = line.strip().split(',')
labels.append(line[1])
reviews.append(line[0])
return reviews, labels
X_train, y_train = load_data('/Users/abc/Sep_10/train_data.csv')
X_test, y_test = load_data('/Users/abc/Sep_10/test_data.csv')
vec = CountVectorizer()
X_train_transformed = vec.fit_transform(X_train)
X_test_transformed = vec.transform(X_test)
clf= MultinomialNB()
clf.fit(X_train_transformed, y_train)
score = clf.score(X_test_transformed, y_test)
print("score of Naive Bayes algo is :" , score)
y_pred = clf.predict(X_test_transformed)
print(confusion_matrix(y_test,y_pred))
print("Precision Score : ",precision_score(y_test,y_pred,pos_label='positive'))
print("Recall Score :" , recall_score(y_test, y_pred, pos_label='positive') )
答案 0 :(得分:5)
您需要添加'average'
参数。根据{{3}}:
平均值:字符串,[无,“二进制”(默认),“微”,“宏”, “样本”,“加权”]
This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:
执行以下操作:
print("Precision Score : ",precision_score(y_test, y_pred,
pos_label='positive'
average='micro'))
print("Precision Score : ",recall_score(y_test, y_pred,
pos_label='positive'
average='micro'))
用'micro'
以外的上述任一选项替换'binary'
。另外,在多类设置中,无需提供'pos_label'
,因为它将始终被忽略。
更新评论:
是的,它们可以相等。它在the documentation中给出:
请注意,对于所有 包含的标签将产生相同的精度,召回率和F,而 “加权”平均可能会产生不介于 精确度和召回率。