我在scikit-learn上使用SVC来处理10000x1000的大型数据集(10000个具有1000个功能的对象)。我已经在其他来源中看到SVMLIB不能超过~10000个对象,我确实观察到了这一点:
training time for 10000 objects: 18.9s
training time for 12000 objects: 44.2s
training time for 14000 objects: 92.7s
你可以想象当我尝试80000时会发生什么。但是,我发现非常令人惊讶的是SVM的predict()花费的时间比训练适合():
prediction time for 10000 objects (model was also trained on those objects): 49.0s
prediction time for 12000 objects (model was also trained on those objects): 91.5s
prediction time for 14000 objects (model was also trained on those objects): 141.84s
让预测在线性时间内运行是微不足道的(虽然这里可能接近线性),并且通常比训练快得多。那么这里发生了什么?
答案 0 :(得分:2)
您确定没有将训练时间包括在预测时间的度量中吗?你有时间的代码片段吗?