这是数据然后我使用countvectorizer之后我使用MultinomialNB()但我得到错误。请让我知道它的正确语法。
train = [('I love this sandwich.','pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
('What an awesome view', 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this.", 'neg'),
('He is my sworn enemy!.', 'neg'),
('My boss is horrible.', 'neg')
]
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
text_train_cv = cv.fit_transform(list(zip(*train))[0])
print(text_train_cv.toarray())
from sklearn.feature_extraction.text import TfidfTransformer
tfidf_trans = TfidfTransformer()
text_train_tfidf = tfidf_trans.fit_transform(text_train_cv)
print(text_train_tfidf.toarray())
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(text_train_tfidf , train)
这是错误: ValueError:输入形状错误(10,2)
答案 0 :(得分:0)
在MultinomialNB.fit()中使用list(zip(*train))[1]
代替train
。
MultinomialNB().fit(text_train_tfidf , list(zip(*train))[1])
fit()方法需要y
的单个标签列表(或1-d数组)。所以你需要改变你的火车,就像把它传递给CountVectorizer一样。