Opencv - RTrees算法:向类添加权重

时间:2016-06-16 13:56:59

标签: c++ opencv random-forest

我正在使用OpenCV的随机森林算法(即RTree)的实现,并且在设置参数时面临一些小问题。 我有5个类和3个变量,我想为类添加权重,因为每个类的样本大小变化很大。 我查看了文档herehere,似乎 priors 数组是解决方案,但是当我尝试给它5个权重时(对于我的5个类) )它给了我以下错误:

  

OpenCV错误:CvDTreeTrainData :: set_data,文件/home/sguinard/dev/opencv-2.4.13/modules/ml/src/tree中的一个参数'值超出范围(每个类权重应为正) .cpp,第644行   在抛出'cv :: Exception'的实例后终止调用     what():/ home /sguinard / dev / opencv-2.4.13 / modules / ml / src / tree.cpp:644:error:( - 211)每个类的权重都应该是函数CvDTreeTrainData :: set_data

如果我理解得很好,那是因为 priors 数组有5个元素。当我尝试只给它3个元素(作为我的变量数量)时,一切正常。

根据文档,这个数组应该用于增加类的权重,但它实际上似乎用于增加变量的权重......

那么,有没有人知道如何在OpenCV的RTree算法上为类添加权重? (我在c ++中使用OpenCV 2.4.13)

提前致谢!

这是我的代码:

cv::Mat RandomForest(cv::Mat train_data, cv::Mat response_data, cv::Mat sample_data, int size, int size_predict, float weights[5])
{

#undef CV_TERMCRIT_ITER
#define CV_TERMCRIT_ITER 10
#define ATTRIBUTES_PER_SAMPLE 3

cv::RandomTrees RFTree;
float priors[] = {1,1,1};


CvRTParams RFParams = CvRTParams(25, // max depth
                 500, // min sample count
                 0, // regression accuracy: N/A here
                 false, // compute surrogate split, no missing data
                 5, // max number of categories (use sub-optimal algorithm for larger numbers)
                 //priors
                 weights, // the array of priors (use weights or priors)
                 true,//false,  // calculate variable importance
                 2,       // number of variables randomly selected at node and used to find the best split(s).
                 100,     // max number of trees in the forest
                 0.01f,                // forrest accuracy
                 CV_TERMCRIT_ITER |    CV_TERMCRIT_EPS // termination cirteria
                 );

cv::Mat varIdx = cv::Mat();
cv::Mat vartype( train_data.cols + 1, 1, CV_8U );
vartype.setTo(cv::Scalar::all(CV_VAR_NUMERICAL));
vartype.at<uchar>(ATTRIBUTES_PER_SAMPLE, 0) = CV_VAR_CATEGORICAL;
cv::Mat sampleIdx = cv::Mat();
cv::Mat missingdatamask = cv::Mat();

for (int i=0; i!=train_data.rows; ++i)
{
    for (int j=0; j!=train_data.cols; ++j)
    {
        if(train_data.at<float>(i,j)<0
            || train_data.at<float>(i,j)>10000
            || !float(train_data.at<float>(i,j)))
            {train_data.at<float>(i,j)=0;}
    }
}

// Training
std::cout << "Training ....." << std::flush;
bool train = RFTree.train(train_data,
             CV_ROW_SAMPLE,//tflag,
             response_data,//responses,
             varIdx,
             sampleIdx,
             vartype,
             missingdatamask,
             RFParams);
if (train){std::cout << " Done" << std::endl;}
else{std::cout << " Failed" << std::endl;return cv::Mat();}

std::cout << "Variable Importance : " << std::endl;
cv::Mat VI = RFTree.getVarImportance();
for (int i=0; i!=VI.cols; ++i){std::cout << VI.at<float>(i) << " - " << std::flush;}
std::cout << std::endl;

std::cout << "Predicting ....." << std::flush;
cv::Mat predict(1,sample_data.rows,CV_32F);
float max = 0;
for (int i=0; i!=sample_data.rows; ++i)
{
    predict.at<float>(i) = RFTree.predict(sample_data.row(i));
    if (predict.at<float>(i)>max){max=predict.at<float>(i);/*std::cout << predict.at<float>(i) << "-"<< std::flush;*/}
}
// Personnal test due to an error I got (everyone sent to 0)
if (max==0){std::cout << " Failed ... Max value = 0" << std::endl;return cv::Mat();}
std::cout << " Done ... Max value = " << max << std::endl;

return predict;
}

0 个答案:

没有答案