训练贝叶斯分类器

时间:2019-03-17 21:14:42

标签: python-3.x naivebayes

我正在尝试用Python训练和测试贝叶斯分类器。

这些代码行来自我发现的一个示例here,但我不知道它们的作用。

train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)

测试集中稍后会有一个类似的代码块:

test_matrix = extract_features(test_dir)
test_labels = np.zeros(260)
test_labels[130:260] = 1

想知道这是怎么做的以及如何将其应用于其他分类示例? []中的数字是什么意思? 非常感谢

1 个答案:

答案 0 :(得分:1)

The example code, referenced in your post, is training a binary classifier with Naive-Bayes and SVC model.

train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)

This is setting the label for 702 records with all 0 initially. and sets the later half with 1. Binary labels like: spam or ham, true or false, etc. The extract_features builds the {(docid, wordid)->wordcount,..} which is input to these models.

Once you train the model, you need to see how well it performs against a test set. Here you are using 260 records as test set with first half all 0s and the later half all 1s.

test_matrix = extract_features(test_dir)
test_labels = np.zeros(260)
test_labels[130:260] = 1

Finally, you run the prediction against the test set and evaluate how close is the accuracy to the test_set of both of these models (NB and SVC).