机器学习中训练,测试(Dev)和验证分数的解释

时间:2019-06-01 13:01:23

标签: python validation machine-learning scikit-learn data-science

我已经使用Sklearn训练了Machine Learnig模型,并查看了训练,测试(dev)和验证集的不同分数。

这是分数

Accuracy on Train: 94.5468% Accuracy on Test: 74.4646% Accuracy on Validation: 65.6548% Precision on Train: 96.7002% Precision on Test: 85.2289% Precision on Validation: 79.7178% F1-Score on Train: 96.9761% F1-Score on Test: 85.6203% F1-Score on Validation: 79.6747%

我在解释分数时遇到一些问题。这是正常的,模型在验证集上的结果要差得多吗?

您对这些结果有想法吗?

1 个答案:

答案 0 :(得分:0)

As you explained in the comments, your test set is the set you used to tune your parameters and the validation set is the set that your model didn't use for training.
Considering that, it's natural that your Validation scores are lower than other scores.
When you're training a machine learning model, you show the training set to your model, that's why your model get's the best scores on training set, i.e. samples it has already seen and knows the answer for.
You use validation set to tune your parameters (e.g. degree of complexity in regression and so on) so your parameters are fine tuned for your validation sets but your model has not been trained on them. (for this you used the term test set, and to be fair they are sometimes used that way)
finally you have the least score on your test set which is natural since the parameters are not exactly tuned for the test set and the model has never seen them before.
if there is a huge hap between your training and test results, your model might have become overfit and there are ways to avoid that.

hope this helped ;)