在这个问题上已经存在多个线程但是所提出的解决方案似乎都不适用于我。
这是我的代码:
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(s)
tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
clf = MultinomialNB().fit(X_train_tfidf, target_train)
运行此代码时出现以下错误:
回溯(最近一次呼叫最后一次):文件" distance_sup1.py",行 447,in clf = MultinomialNB()。fit(X_train_tfidf,target_train)文件" C:\ Users \ Akanksha \ Anaconda3 \ lib \ site-packages \ sklearn \ naive_bayes.py", 第527行,合适 X,y = check_X_y(X,y,' csr')文件" C:\ Users \ Akanksha \ Anaconda3 \ lib \ site-packages \ sklearn \ utils \ validation.py", 行520,在check_X_y check_consistent_length(X,y)文件中 " C:\用户\ Akanksha \ Anaconda3 \ lib中\站点包\ sklearn \ utils的\ validation.py&#34 ;, 第176行,在check_consistent_length中 "%S" %str(uniques))ValueError:找到样本数不一致的数组:[3 21]
我的target_train :[1,1,0]
X_train_tfidf是:
(0, 5)---1.0
(2, 2)---0.670546709445
(2, 17)---0.741867313239
(3, 12)---0.707106781187
(3, 13)---0.707106781187
(4, 9)---0.741867313239
(4, 2)---0.670546709445
(5, 8)---0.707106781187
(5, 0)---0.707106781187
(6, 10)---1.0
(7, 7)---0.457600845725
(7, 3)---0.457600845725
(7, 16)---0.457600845725
(7, 1)---0.457600845725
(7, 5)---0.40299610912
(9, 6)---0.351227138886
(9, 4)---0.398817338825
(9, 9)---0.702454277773
(9, 2)---0.317461354673
(9, 17)---0.351227138886
(10, 15)---0.707106781187
(10, 14)---0.707106781187
(11, 11)---0.750463470011
(11, 6)---0.660911930729
我尝试过重塑它并转换X_train_tfidf,但它似乎没有用。
感谢任何帮助。