Question

我正在尝试用Python训练和测试贝叶斯分类器。

这些代码行来自我发现的一个示例here，但我不知道它们的作用。

train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)

测试集中稍后会有一个类似的代码块：

test_matrix = extract_features(test_dir)
test_labels = np.zeros(260)
test_labels[130:260] = 1

想知道这是怎么做的以及如何将其应用于其他分类示例？ []中的数字是什么意思？非常感谢

Answer 1

The example code, referenced in your post, is training a binary classifier with Naive-Bayes and SVC model.

train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)

This is setting the label for 702 records with all 0 initially. and sets the later half with 1. Binary labels like: spam or ham, true or false, etc. The extract_features builds the {(docid, wordid)->wordcount,..} which is input to these models.

Once you train the model, you need to see how well it performs against a test set. Here you are using 260 records as test set with first half all 0s and the later half all 1s.

test_matrix = extract_features(test_dir)
test_labels = np.zeros(260)
test_labels[130:260] = 1

Finally, you run the prediction against the test set and evaluate how close is the accuracy to the test_set of both of these models (NB and SVC).

训练贝叶斯分类器

1 个答案: