如何在Accord.net中训练具有不同长度输入的数据集

时间:2016-06-18 09:14:47

标签: c# machine-learning accord.net

我想使用Accord.net ann和svm对一些数据集进行分类,问题是我的数据集输入数组的长度不一样, 每个数组的长度可以从10到64, 是一种处理这样一个数据集的方法,还是我需要将它们全部放在相同的大小?

1 个答案:

答案 0 :(得分:1)

您的数据集是否由数字序列组成?如果是,那么您可以使用隐马尔可夫模型。如果您有分类问题,可以使用隐马尔可夫分类器和Baum-Welch学习来创建序列分类器。

例如,请考虑以下涉及不同长度的数据样本的示例:

// Declare some testing data
int[][] inputs = new int[][]
{
    new int[] { 0,1,1,0 },   // Class 0
    new int[] { 0,0,1,0 },   // Class 0
    new int[] { 0,1,1,1,0 }, // Class 0
    new int[] { 0,1,0 },     // Class 0

    new int[] { 1,0,0,1 },   // Class 1
    new int[] { 1,1,0,1 },   // Class 1
    new int[] { 1,0,0,0,1 }, // Class 1
    new int[] { 1,0,1 },     // Class 1
};

int[] outputs = new int[]
{
    0,0,0,0, // First four sequences are of class 0
    1,1,1,1, // Last four sequences are of class 1
};


// We are trying to predict two different classes
int classes = 2;

// Each sequence may have up to two symbols (0 or 1)
int symbols = 2;

现在您可以创建隐藏的Markov模型来对它们进行分类:

// Nested models will have two states each
int[] states = new int[] { 2, 2 };

// Creates a new Hidden Markov Model Sequence Classifier with the given parameters
HiddenMarkovClassifier classifier = new HiddenMarkovClassifier(classes, states, symbols);

// Create a new learning algorithm to train the sequence classifier
var teacher = new HiddenMarkovClassifierLearning(classifier,

    // Train each model until the log-likelihood changes less than 0.001
    modelIndex => new BaumWelchLearning(classifier.Models[modelIndex])
    {
        Tolerance = 0.001,
        Iterations = 0
    }
);

// Train the sequence classifier using the algorithm
double likelihood = teacher.Run(inputs, outputs);

// Classify the sequences as belonging to one of the classes:
int output = classifier.Decide(new int[] { 1,0,0,1 }) // output should be 1