如何格式化IRIS数据集以输入SVM-Light库?

时间:2017-05-31 17:51:01

标签: python machine-learning classification svmlight

我正在尝试使用SVM-Light library来训练和分类IRIS数据集。 Here是我正在使用的python包装器。我目前正在关注页面上的示例,但我不确定如何正确格式化IRIS数据以进行输入。 IRIS数据集中的示例行看起来像5.0,3.6,1.4,0.2,Iris-setosa

1 个答案:

答案 0 :(得分:0)

我不知道你的图书馆,但我强烈推荐使用强大的通用ML-lib scikit-learn。我想你有充分的理由使用svmlight,否则,使用sklearn(全自动;不是基于文件和自动多类和co。),基于libsvm或liblinear的使用要容易得多。

这是一个简单的例子。请记住,只有二进制目标才支持imho,如果你需要多班学习,你可以使用sklearn的multiclass tools

加载和准备Iris的代码

from sklearn.datasets import load_iris
from sklearn.datasets import dump_svmlight_file

iris = load_iris()
X = iris.data
y = iris.target

""" only keep first two classes """
indices = y<=1
X = X[indices]
y = y[indices]

""" transform to +1 / -1 targets (0 -> -1) """
y[y==0] = -1

dump_svmlight_file(X, y, 'my_dataset', zero_based=False)  # 1-based!!!

svmlight call

./svm_learn my_dataset my_output -v3
Scanning examples...done
Reading examples into memory...100..OK. (100 examples read)
Setting default regularization parameter C=0.0199
Optimizing...............done. (16 iterations)
Optimization finished (0 misclassified, maxdiff=0.00057).
Runtime in cpu-seconds: 0.00
Number of SV: 32 (including 28 at upper bound)
L1 loss: loss=4.89469
Norm of weight vector: |w|=0.69732
Norm of longest example vector: |x|=9.13674
Estimated VCdim of classifier: VCdim<=31.50739
Computing XiAlpha-estimates...done
Runtime for XiAlpha-estimates in cpu-seconds: 0.00
XiAlpha-estimate of the error: error<=30.00% (rho=1.00,depth=0)
XiAlpha-estimate of the recall: recall=>70.00% (rho=1.00,depth=0)
XiAlpha-estimate of the precision: precision=>70.00% (rho=1.00,depth=0)
Number of kernel evaluations: 1291
Writing model file...done