我试图重现一个看过的教程 here
在我使用训练集添加.fit
方法之前,一切都很完美。
以下是我的代码示例:
# TRAINING PART
train_dir = 'pdf/learning_set'
dictionary = make_dic(train_dir)
train_labels = np.zeros(20)
train_labels[17:20] = 1
train_matrix = extract_features(train_dir)
model1 = MultinomialNB()
model1.fit(train_matrix, train_labels)
# TESTING PART
test_dir = 'pdf/testing_set'
test_matrix = extract_features(test_dir)
test_labels = np.zeros(8)
test_labels[4:7] = 1
result1 = model1.predict(test_matrix)
print(confusion_matrix(test_labels, result1))
这是我的追溯:
Traceback (most recent call last):
File "ML.py", line 65, in <module>
model1.fit(train_matrix, train_labels)
File "/usr/local/lib/python3.6/site-packages/sklearn/naive_bayes.py",
line 579, in fit
X, y = check_X_y(X, y, 'csr')
File "/usr/local/lib/python3.6/site-
packages/sklearn/utils/validation.py", line 552, in check_X_y
check_consistent_length(X, y)
File "/usr/local/lib/python3.6/site-
packages/sklearn/utils/validation.py", line 173, in
check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of
samples: [23, 20]
我想知道如何解决这个问题? 我正在使用python 3.6在Ubuntu 16.04上工作。
答案 0 :(得分:1)
ValueError:找到数量不一致的输入变量 样本:[23,20]
这意味着你有23个训练向量(train_matrix有23行) 但只有20个训练标签(train_labels是20个值的数组)
更改train_labels = np.zeros(20)
到train_labels = np.zeros(23)
它应该工作。