TypeError:传递了稀疏矩阵,但需要密集数据。使用X.toarray()转换为密集的numpy数组。与NaiveBayes分类器

时间:2017-07-12 07:24:47

标签: python numpy nlp nltk naivebayes

多项式朴素贝叶斯分类器给出了正确的结果,但另外两个 - 高斯NB和二项式NB不是。它给出的错误是:

  

TypeError:传递了稀疏矩阵,但需要密集数据。使用X.toarray()转换为密集的numpy数组。

但即使添加该函数(train_set.toarray()),错误仍然是

  

AttributeError:' list'对象没有属性' toarray'

代码是

import pickle
from nltk.corpus import names
import random
import nltk
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB
from sklearn.linear_model import SGDClassifier, LogisticRegression
from sklearn.svm import SVC, LinearSVC, NuSVC
from nltk.classify.scikitlearn import SklearnClassifier
import numpy as np
import scipy as sc

def gender_features(word):
    return {'last_letter': word[-1]}

labeled_names = ([(name, 'male') for name in names.words('male.txt')] + [(name, 'female') for name in names.words('female.txt')])
random.shuffle(labeled_names)

featuresets = [(gender_features(n), gender) for (n, gender) in labeled_names]
train_set, test_set = featuresets[500:], featuresets[:500]
classifier = nltk.NaiveBayesClassifier.train(train_set)

print(nltk.classify.accuracy(classifier, test_set)*100)
classifier.show_most_informative_features(5)

MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(train_set)
print ("MNB classifier accuracy: ", (nltk.classify.accuracy(MNB_classifier, test_set))*100)


G_classifier = SklearnClassifier(GaussianNB())
G_classifier.train(train_set)
print ("Gaussian classifier accuracy: ", (nltk.classify.accuracy(G_classifier, test_set))*100)

B_classifier = SklearnClassifier(BernoulliNB())
B_classifier.train(train_set)
print ("Bernoulli classifier accuracy: ", (nltk.classify.accuracy(B_classifier, test_set))*100)

2 个答案:

答案 0 :(得分:0)

在培训期间尝试使用时,我遇到了同样的问题:

train_set.todense()

对我有用:

答案 1 :(得分:-1)

也许你可以做:numpy.array(train_set),使列表密集m