多项式朴素贝叶斯分类器给出了正确的结果,但另外两个 - 高斯NB和二项式NB不是。它给出的错误是:
TypeError:传递了稀疏矩阵,但需要密集数据。使用X.toarray()转换为密集的numpy数组。
但即使添加该函数(train_set.toarray()
),错误仍然是
AttributeError:' list'对象没有属性' toarray'
代码是
import pickle
from nltk.corpus import names
import random
import nltk
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB
from sklearn.linear_model import SGDClassifier, LogisticRegression
from sklearn.svm import SVC, LinearSVC, NuSVC
from nltk.classify.scikitlearn import SklearnClassifier
import numpy as np
import scipy as sc
def gender_features(word):
return {'last_letter': word[-1]}
labeled_names = ([(name, 'male') for name in names.words('male.txt')] + [(name, 'female') for name in names.words('female.txt')])
random.shuffle(labeled_names)
featuresets = [(gender_features(n), gender) for (n, gender) in labeled_names]
train_set, test_set = featuresets[500:], featuresets[:500]
classifier = nltk.NaiveBayesClassifier.train(train_set)
print(nltk.classify.accuracy(classifier, test_set)*100)
classifier.show_most_informative_features(5)
MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(train_set)
print ("MNB classifier accuracy: ", (nltk.classify.accuracy(MNB_classifier, test_set))*100)
G_classifier = SklearnClassifier(GaussianNB())
G_classifier.train(train_set)
print ("Gaussian classifier accuracy: ", (nltk.classify.accuracy(G_classifier, test_set))*100)
B_classifier = SklearnClassifier(BernoulliNB())
B_classifier.train(train_set)
print ("Bernoulli classifier accuracy: ", (nltk.classify.accuracy(B_classifier, test_set))*100)
答案 0 :(得分:0)
在培训期间尝试使用时,我遇到了同样的问题:
train_set.todense()
对我有用:
答案 1 :(得分:-1)
也许你可以做:numpy.array(train_set),使列表密集m