SVC大小>>酸洗时的LinearSVC尺寸

时间:2017-03-07 20:41:11

标签: python machine-learning scikit-learn

我尝试了多个分类器。我希望保存所有分类器,并在测试过程中轻松访问。目前,使用LinearSVC时,训练模型为5 MB或更小。使用SVC时,模型大小超过400 MB,加载到内存大约需要一分钟。我可以使用LinearSVC,但我也想尝试RBF内核。我无法理解预先描述的尺寸之间的巨大差异。任何人都可以向我解释为什么会发生这种情况(如果它是可解释的,否则请指出一个可能的错误)并且可能提出一个解决方案来截断SVC模型的大小,或者避免使用SVC来实现RBF内核?谢谢大家。

实施例

取自教程页面并添加了泡菜。

import os
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
import cPickle as pickle
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] 
y = iris.target
C = 1.0  # SVM regularization parameter
svc = svm.SVC(kernel='linear', C=C).fit(X, y)
lin_svc = svm.LinearSVC(C=C).fit(X, y)
rbf_svc = svm.SVC(kernel='rbf', gamma=0.7, C=C).fit(X, y)
with open('svcpick','w') as out:
    pickle.dump(svc,out)
with open('rbfsvcpick','w') as out:
    pickle.dump(rbf_svc,out)
with open('linsvcpick','w') as out:
    pickle.dump(lin_svc,out)
print 'SVC(Linear):',os.path.getsize('./svcpick'),' B'
print 'SVC(RBF):',os.path.getsize('./rbfsvcpick'),' B'
print 'LinearSVC:',os.path.getsize('./linsvcpick'),' B'

输出:

SVC(Linear): 11481 B
SVC(RBF): 12087 B
LinearSVC: 1188 B

多标签分类的另一个例子

再次(部分)从教程中获取

import os
import numpy as np
from sklearn import svm, datasets
from sklearn.datasets import make_multilabel_classification
from sklearn.multiclass import OneVsRestClassifier
import cPickle as pickle
# import some data to play with
X, Y = make_multilabel_classification(n_classes=10, n_labels=1,
                                      allow_unlabeled=True,
                                      random_state=1)
msvc = OneVsRestClassifier(svm.SVC(kernel='linear')).fit(X, Y)        
mrbf_svc = OneVsRestClassifier(svm.SVC(kernel='rbf')).fit(X, Y)    
mlin_svc = OneVsRestClassifier(svm.LinearSVC()).fit(X, Y)   

with open('msvcpick','w') as out:
    pickle.dump(msvc,out)
with open('mrbfsvcpick','w') as out:
    pickle.dump(mrbf_svc,out)
with open('mlinsvcpick','w') as out:
    pickle.dump(mlin_svc,out)
print 'mSVC(Linear):',os.path.getsize('./msvcpick'),' B'
print 'mSVC(RBF):',os.path.getsize('./mrbfsvcpick'),' B'
print 'mLinearSVC:',os.path.getsize('./mlinsvcpick'),' B'

输出:

mSVC(Linear): 126539 B
mSVC(RBF): 561532 B
mLinearSVC: 9782 B

在我的实现中,我试图使用超过2个类的多标签分类,这就是为什么我将默认值更改为10.One可以看到大小的差异。在我的实现中,mLinearSVC的大小超过1 MB,而不是10KB,如上所示,由于我必须处理的多维数据(每个样本256个特征)。

0 个答案:

没有答案