Accord.NET多类SVM分类内核如何解决内存不足异常

时间:2015-05-04 21:01:45

标签: c# svm accord.net

我想使用nursery data来训练SVM(8个属性和5个类),使用与example上看到的C45学习类相同的逻辑:

例如,从包含 8个属性 "parents", "has_nurs", "form", "children", "housing", "finance", "social", "health"的托儿所数据加载数据 这些属性的组合会产生 5个类 "not_recom","recommend", "very_recom","priority","spec_prior"

之一

但是我不知道哪种 Kernel 最适合这种SVM数据。根据定义,多项式核是一个核函数,它表示特征空间中的矢量(训练样本)与原始变量的多项式之间的相似性,允许学习非线性模型。 我尝试使用这个内核但在使用数据训练机器时遇到了问题。

到目前为止,我使用示例中显示的代码来训练SVM并使用svm代码,如:

#//same code as C45 Example to get input and output data

string nurseryData = Resources.nursery;
string[] inputColumns =
{
“parents”, “has_nurs”, “form”, “children”,
“housing”, “finance”, “social”, “health”
};
string outputColumn = “output”;
DataTable table = new DataTable(“Nursery”);
table.Columns.Add(inputColumns);
table.Columns.Add(outputColumn);
string[] lines = nurseryData.Split(
new[] { Environment.NewLine }, StringSplitOptions.None);
foreach (var line in lines)
     table.Rows.Add(line.Split(‘,’));
Codification codebook = new Codification(table);
DataTable symbols = codebook.Apply(table);
double[][] inputs = symbols.ToArray(inputColumns);
int[] outputs = symbols.ToArray(outputColumn);
int inputDimension = 8;
int outputClasses = 5;

#//SVM

IKernel kernel = new Polynomial(2, 5);
// Create the Multi-class Support Vector Machine using the selected Kernel
var ksvm = new MulticlassSupportVectorMachine(inputDimension, kernel, outputClasses);
// Create the learning algorithm using the machine and the training data
var ml = new MulticlassSupportVectorLearning(ksvm, inputs, outputs);
ml.Algorithm = (svm, classInputs, classOutputs, i, j) =>
        new SequentialMinimalOptimization(svm, classInputs, classOutputs);
double SVMerror = ml.Run();

但是我在训练机器时遇到错误,我错过了什么?

enter image description here

修改

我现在有其他问题,尝试使用Cesar的代码

enter image description here

1 个答案:

答案 0 :(得分:4)

框架自动构建内核函数缓存,以帮助加速SVM学习期间的计算。但是,有些情况下此缓存可能会占用太多内存并导致OutOfMemoryExceptions。

要在内存消耗和CPU速度之间取得平衡,请设置CacheSize property to a lower value。默认是将所有输入向量存储在缓存中;将它设置为较低的值(例如训练样本的数量为1/20)就足够了。

如果将CacheSize设置为零,则将完全禁用缓存。训练可能会慢一点,但你不会有任何记忆问题。请看下面的代码。我得到的错误大约是0.09。

// same code to get input and output data
string nurseryData = Properties.Resources.nursery;

string[] inputColumns =
{
    "parents", "has_nurs", "form", "children",
    "housing", "finance", "social", "health"
};

string outputColumn = "output";

DataTable table = new DataTable("Nursery");
table.Columns.Add(inputColumns);
table.Columns.Add(outputColumn);

string[] lines = nurseryData.Split(
    new[] { Environment.NewLine }, StringSplitOptions.None);

foreach (var line in lines)
    table.Rows.Add(line.Split(','));


Codification codebook = new Codification(table);

DataTable symbols = codebook.Apply(table);

double[][] inputs = symbols.ToArray(inputColumns);
int[] outputs = Matrix.ToArray<int>(symbols, outputColumn);

//SVM
IKernel kernel = new Linear();

// Create the Multi-class Support Vector Machine using the selected Kernel
int inputDimension = inputs[0].Length;
int outputClasses = codebook[outputColumn].Symbols;
var ksvm = new MulticlassSupportVectorMachine(inputDimension, kernel, outputClasses);

// Create the learning algorithm using the machine and the training data
var ml = new MulticlassSupportVectorLearning(ksvm, inputs, outputs)
{
    Algorithm = (svm, classInputs, classOutputs, i, j) =>
    {
        return new SequentialMinimalOptimization(svm, classInputs, classOutputs)
        {
            CacheSize = 0
        };
    }
};

double SVMerror = ml.Run(); // should be around 0.09

但是,我同意这可能不太明显。我将在修复版本中添加更好的方法来处理此案例。感谢您发布您的问题!