Matlab:获取决策树的递归

时间:2012-10-29 01:04:32

标签: matlab recursion machine-learning classification decision-tree

我正在尝试使用递归实现决策树:到目前为止,我已经写了以下内容:

  1. 从给定数据集中,找到最佳分割并返回分支,以提供更多详细信息,假设我将数据作为矩阵列,最后一列表示数据类1,-1。
  2. 基于1.我有一个最好的功能,可以与分割下的分支一起分割,假设基于信息增益我得到的特征9是最佳分割,而特征9 {1,3,5}中的唯一值是分支机构9
  3. 我已经想过如何获取与ach分支相关的数据,然后我需要迭代每个分支的数据以获得下一组分割。我无法计算这个递归。
  4. 这是我到目前为止的代码,我现在正在做的递归看起来不正确:我该如何解决这个问题?

    function [indeces_of_node, best_split] = split_node(X_train, Y_train)
    
        %cell to save split information
        feature_to_split_cell = cell(size(X_train,2)-1,4);
    
        %iterate over features
        for feature_idx=1:(size(X_train,2) - 1)
            %get current feature
            curr_X_feature = X_train(:,feature_idx);
    
            %identify the unique values
            unique_values_in_feature = unique(curr_X_feature);
    
            H = get_entropy(Y_train); %This is actually H(X) in slides
            %temp entropy holder
    
            %Storage for feature element's class
            element_class = zeros(size(unique_values_in_feature,1),2);
    
            %conditional probability H(X|y)
            H_cond = zeros(size(unique_values_in_feature,1),1); 
    
            for aUnique=1:size(unique_values_in_feature,1)
                match = curr_X_feature(:,1)==unique_values_in_feature(aUnique);
                mat = Y_train(match);
                majority_class = mode(mat);
                element_class(aUnique,1) = unique_values_in_feature(aUnique);
                element_class(aUnique,2) = majority_class;
                H_cond(aUnique,1) = (length(mat)/size((curr_X_feature),1)) * get_entropy(mat);
            end
    
            %Getting the information gain
            IG = H - sum(H_cond);
    
            %Storing the IG of features
            feature_to_split_cell{feature_idx, 1} = feature_idx;
            feature_to_split_cell{feature_idx, 2} = max(IG);
            feature_to_split_cell{feature_idx, 3} = unique_values_in_feature;
            feature_to_split_cell{feature_idx, 4} = element_class;
        end
        %set feature to split zero for every fold
        feature_to_split = 0;
    
        %getting the max IG of the fold
        max_IG_of_fold = max([feature_to_split_cell{:,2:2}]);
    
        %vector to store values in the best feature
        values_of_best_feature = zeros(size(15,1));
    
        %Iterating over cell to get get the index and the values under best
        %splited feature.
        for i=1:length(feature_to_split_cell)
            if (max_IG_of_fold == feature_to_split_cell{i,2});
                feature_to_split = i;
                values_of_best_feature = feature_to_split_cell{i,4};
            end
        end
        display(feature_to_split)
        display(values_of_best_feature(:,1)')
    
        curr_X_feature = X_train(:,feature_to_split);
    
        best_split = feature_to_split
        indeces_of_node = unique(curr_X_feature)
    
        %testing
        for k = 1 : length(values_of_best_feature)
            % Condition to stop the recursion, if clases are pure then we are
            % done splitting, if both classes have save number of attributes
            % then we are done splitting.
            if (sum(values_of_best_feature(:,2) == -1) ~= sum(values_of_best_feature(:,2) == 1))
                if((sum(values_of_best_feature(:,2) == -1) ~= 0) || (sum(values_of_best_feature(:,2) == 1) ~= 0))
                    mat1 = X_train(X_train(:,5)== values_of_best_feature(k),:);
                    [indeces_of_node, best_split] = split_node(mat1, Y_train);
                end
            end
        end
    end
    

    以下是我的代码:看起来像我的递归中的一些我只是深入一个分支,然后我再也没有回到其余的分支

    feature_to_split =
    
         5
    
    
    ans =
    
         1     2     3     4     5     6     7     8     9
    
    
    feature_to_split =
    
         9
    
    
    ans =
    
         3     5     7     8    11
    
    
    feature_to_split =
    
        21
    
    
    feature_to_split =
    
        21
    
    
    feature_to_split =
    
        21
    
    
    feature_to_split =
    
        21
    

    如果您有兴趣运行此代码:git

1 个答案:

答案 0 :(得分:0)

经过多轮调试后,我想出了答案,希望有人能从中受益:

for k = 1 : length(values_of_best_feature)
    % Condition to stop the recursion, if clases are pure then we are
    % done splitting, if both classes have save number of attributes
    % then we are done splitting.
    if((sum(values_of_best_feature(:,2) == -1) ~= 0) || (sum(values_of_best_feature(:,2) == 1) ~= 0))
        X_train(:,feature_to_split) = [];
        mat1 = X_train(X_train(:,5)== values_of_best_feature(k),:);
        %if(level >= curr_level)
        split_node(mat1, Y_train, 1, 2, level-1);
        %end
    end

end
return;