如何通过解决此错误来训练MultinomialNB [ValueError:错误的输入形状(10,2)]

时间:2018-04-16 08:37:20

标签: python scikit-learn

这是数据然后我使用countvectorizer之后我使用MultinomialNB()但我得到错误。请让我知道它的正确语法。

train = [('I love this sandwich.','pos'),
         ('This is an amazing place!', 'pos'),
         ('I feel very good about these beers.', 'pos'),
         ('This is my best work.', 'pos'),
         ('What an awesome view', 'pos'),
         ('I do not like this restaurant', 'neg'),
         ('I am tired of this stuff.', 'neg'),
         ("I can't deal with this.", 'neg'),
         ('He is my sworn enemy!.', 'neg'),
         ('My boss is horrible.', 'neg')
        ]

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()

text_train_cv = cv.fit_transform(list(zip(*train))[0])
print(text_train_cv.toarray())

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_trans = TfidfTransformer()

text_train_tfidf = tfidf_trans.fit_transform(text_train_cv)
print(text_train_tfidf.toarray())



from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(text_train_tfidf , train)

这是错误: ValueError:输入形状错误(10,2)

1 个答案:

答案 0 :(得分:0)

在MultinomialNB.fit()中使用list(zip(*train))[1]代替train

MultinomialNB().fit(text_train_tfidf , list(zip(*train))[1])

fit()方法需要y的单个标签列表(或1-d数组)。所以你需要改变你的火车,就像把它传递给CountVectorizer一样。