Question

这里的概念问题。

我正在递归地构建决策树。函数的每次迭代都会获取训练样例的子集，遍历所有特征以及每个特征内的所有可能分割，找到可能的最佳分割，将子集分成两个较小的子集并调用自身（函数）两次，每个一个分子集。

我之前在MatLab中对此进行了编码，但它运行得太慢所以现在我在C中尝试它（我不太熟悉）。在MatLab中，我使用了一个全局的“分裂”矩阵来保存每个分裂的信息（哪个特征，该特征中的值，如果这是一个叶子节点，分类是什么，每个孩子的行＃），以及我可以通过一个新的测试数据点来遍历该矩阵，以找到它的分类。

看起来C中的全局2D数组可以使用头文件，但如果还有另一种方法，我宁愿不进入头文件。问题是，因为函数是递归调用的，所以我很难知道“拆分”中的下一个可用行是什么。我可以做一些像孩子的行是2 * i和2 * i + 1的父行，但是对于有大量拆分的大型数组，这将需要大量的初始存储。

有什么想法吗？

Answer 1

听起来像你必须放弃2D数组来代表你的树。 C中任意度数的树通常看起来像：

struct node
{       struct node ** children;
        int num_children;
        /* Values in the node/leafs */
};

如果树的程度是固定的，或者对于每个节点都有一个最大值，那么以下就可以了

struct node
{       struct node * children;
        int num_children; /* If degree has only a maximum */
        /* Values in the node/leafs */
};

您必须使用malloc和朋友为节点及其子节点分配内存。

关于头文件：头文件是祝福（在C中），而不是诅咒，但如果你坚持不这样做，那么他们总是可以替换他们的#include实例。

如果您要从MatLab转到其他语言以加快实施速度，那么您可能需要首先考虑除C之外的其他语言。像Java，Python甚至Haskell这样的语言可能会给你类似的加速，但对所有指针来说都不那么麻烦。

Answer 2

在C语言中使用这种功能设计并不漂亮，因为无法保证递归调用将优化为循环，并且没有匿名函数。至少在C ++中有lambdas;我建议C ++更适合这个，虽然AFAIK仍然无法保证C ++中的优化。

为了避免可能导致堆栈增长的递归的可能性，每个分支需要返回它选择的下一个分支。然后调用者（main）循环返回值，并在返回值为终值时终止循环。我们将分支类型定义为函数指针：

typedef void branch();

然后我们声明每个实际分支返回一个分支类型：

branch initial(void) {
    /* do initial processing */
    srand(time(NULL));
    int x = rand() % 2;
    return x == 0 ? left : right;
}

branch terminal(void) {
    /* This should never be called */
    assert(0);
    return NULL;
}

branch left(void) {
    /* do left processing */
    return terminal; /* return a terminal branch to indicate no further
                      * processing */
}

branch right(void) {
    int x;
    /* do right processing, storing either a 0 in x to indicate right_left
     * as the selected branch, or 1 in x to indicate right_right...
     */
    return x == 0 ? right_left : right_right;
}

branch right_left(void) {
    /* do right_left processing */
    return initial; /* return initial to repeat that branch */
}

branch right_right(void) {
    /* do right_right processing; */
    return right; /* return right to repeat that branch */
}

...并且循环返回值将如下所示：

int main(void) {
    branch *(b)(void) = initial;
    while (b != terminal) {
        b = b();
    }
}

我在C中构建决策树时如何存储决策树？

2 个答案: