我正在使用PYML来构建多类线性支持向量机(SVM)。在训练SVM之后,我希望能够保存分类器,以便在后续运行中我可以立即使用分类器而无需重新训练。不幸的是,没有为该分类器实现.save()函数,并且尝试对其进行pickle(使用标准pickle和cPickle)会产生以下错误消息:
pickle.PicklingError: Can't pickle : it's not found as __builtin__.PySwigObject
有没有人知道这个或替代库的方法没有这个问题?感谢。
修改/更新
我现在正在训练并尝试使用以下代码保存分类器:
mc = multi.OneAgainstRest(SVM()); mc.train(dataset_pyml,saveSpace=False); for i, classifier in enumerate(mc.classifiers): filename=os.path.join(prefix,labels[i]+".svm"); classifier.save(filename);
请注意,我现在使用PyML保存机制而不是使用pickle保存,并且我已将“saveSpace = False”传递给训练函数。但是,我仍然遇到错误:
ValueError: in order to save a dataset you need to train as: s.train(data, saveSpace = False)
但是,我传递的是saveSpace = False ...那么,如何保存分类器呢?
P.S。的
我正在使用它的项目是pyimgattr,如果你想要一个完整的可测试的例子......程序是用“./pyimgattr.py train”运行的......这会给你带来这个错误。另外,有关版本信息的说明:
[michaelsafyan@codemage /Volumes/Storage/classes/cse559/pyimgattr]$ python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import PyML >>> print PyML.__version__ 0.7.0
答案 0 :(得分:2)
在第96行的multi.py中调用“self.classifiers [i] .train(datai)”而不传递“** args”,这样如果你调用“mc.train(data,saveSpace = False)” ,这个saveSpace-Argument迷路了。如果您尝试单独保存多类分类器中的分类器,则会出现错误消息。但是,如果更改此行以传递所有参数,则可以单独保存每个分类器:
#!/usr/bin/python
import numpy
from PyML.utils import misc
from PyML.evaluators import assess
from PyML.classifiers.svm import SVM, loadSVM
from PyML.containers.labels import oneAgainstRest
from PyML.classifiers.baseClassifiers import Classifier
from PyML.containers.vectorDatasets import SparseDataSet
from PyML.classifiers.composite import CompositeClassifier
class OneAgainstRestFixed(CompositeClassifier) :
'''A one-against-the-rest multi-class classifier'''
def train(self, data, **args) :
'''train k classifiers'''
Classifier.train(self, data, **args)
numClasses = self.labels.numClasses
if numClasses <= 2:
raise ValueError, 'Not a multi class problem'
self.classifiers = [self.classifier.__class__(self.classifier)
for i in range(numClasses)]
for i in range(numClasses) :
# make a copy of the data; this is done in case the classifier modifies the data
datai = data.__class__(data, deepcopy = self.classifier.deepcopy)
datai = oneAgainstRest(datai, data.labels.classLabels[i])
self.classifiers[i].train(datai, **args)
self.log.trainingTime = self.getTrainingTime()
def classify(self, data, i):
r = numpy.zeros(self.labels.numClasses, numpy.float_)
for j in range(self.labels.numClasses) :
r[j] = self.classifiers[j].decisionFunc(data, i)
return numpy.argmax(r), numpy.max(r)
def preproject(self, data) :
for i in range(self.labels.numClasses) :
self.classifiers[i].preproject(data)
test = assess.test
train_data = """
0 1:1.0 2:0.0 3:0.0 4:0.0
0 1:0.9 2:0.0 3:0.0 4:0.0
1 1:0.0 2:1.0 3:0.0 4:0.0
1 1:0.0 2:0.8 3:0.0 4:0.0
2 1:0.0 2:0.0 3:1.0 4:0.0
2 1:0.0 2:0.0 3:0.9 4:0.0
3 1:0.0 2:0.0 3:0.0 4:1.0
3 1:0.0 2:0.0 3:0.0 4:0.9
"""
file("foo_train.data", "w").write(train_data.lstrip())
test_data = """
0 1:1.1 2:0.0 3:0.0 4:0.0
1 1:0.0 2:1.2 3:0.0 4:0.0
2 1:0.0 2:0.0 3:0.6 4:0.0
3 1:0.0 2:0.0 3:0.0 4:1.4
"""
file("foo_test.data", "w").write(test_data.lstrip())
train = SparseDataSet("foo_train.data")
mc = OneAgainstRestFixed(SVM())
mc.train(train, saveSpace=False)
test = SparseDataSet("foo_test.data")
print [mc.classify(test, i) for i in range(4)]
for i, classifier in enumerate(mc.classifiers):
classifier.save("foo.model.%d" % i)
classifiers = []
for i in range(4):
classifiers.append(loadSVM("foo.model.%d" % i))
mcnew = OneAgainstRestFixed(SVM())
mcnew.labels = misc.Container()
mcnew.labels.addAttributes(test.labels, ['numClasses', 'classLabels'])
mcnew.classifiers = classifiers
print [mcnew.classify(test, i) for i in range(4)]
答案 1 :(得分:0)
获取更新版本的PyML。从版本0.7.4开始,可以保存OneAgainstRest分类器(使用.save()和.load());在该版本之前,保存/加载分类器非常简单且容易出错。