我正在使用OpenCV的随机森林算法(即RTree)的实现,并且在设置参数时面临一些小问题。 我有5个类和3个变量,我想为类添加权重,因为每个类的样本大小变化很大。 我查看了文档here和here,似乎 priors 数组是解决方案,但是当我尝试给它5个权重时(对于我的5个类) )它给了我以下错误:
OpenCV错误:CvDTreeTrainData :: set_data,文件/home/sguinard/dev/opencv-2.4.13/modules/ml/src/tree中的一个参数'值超出范围(每个类权重应为正) .cpp,第644行 在抛出'cv :: Exception'的实例后终止调用 what():/ home /sguinard / dev / opencv-2.4.13 / modules / ml / src / tree.cpp:644:error:( - 211)每个类的权重都应该是函数CvDTreeTrainData :: set_data
如果我理解得很好,那是因为 priors 数组有5个元素。当我尝试只给它3个元素(作为我的变量数量)时,一切正常。
根据文档,这个数组应该用于增加类的权重,但它实际上似乎用于增加变量的权重......
那么,有没有人知道如何在OpenCV的RTree算法上为类添加权重? (我在c ++中使用OpenCV 2.4.13)
提前致谢!
这是我的代码:
cv::Mat RandomForest(cv::Mat train_data, cv::Mat response_data, cv::Mat sample_data, int size, int size_predict, float weights[5])
{
#undef CV_TERMCRIT_ITER
#define CV_TERMCRIT_ITER 10
#define ATTRIBUTES_PER_SAMPLE 3
cv::RandomTrees RFTree;
float priors[] = {1,1,1};
CvRTParams RFParams = CvRTParams(25, // max depth
500, // min sample count
0, // regression accuracy: N/A here
false, // compute surrogate split, no missing data
5, // max number of categories (use sub-optimal algorithm for larger numbers)
//priors
weights, // the array of priors (use weights or priors)
true,//false, // calculate variable importance
2, // number of variables randomly selected at node and used to find the best split(s).
100, // max number of trees in the forest
0.01f, // forrest accuracy
CV_TERMCRIT_ITER | CV_TERMCRIT_EPS // termination cirteria
);
cv::Mat varIdx = cv::Mat();
cv::Mat vartype( train_data.cols + 1, 1, CV_8U );
vartype.setTo(cv::Scalar::all(CV_VAR_NUMERICAL));
vartype.at<uchar>(ATTRIBUTES_PER_SAMPLE, 0) = CV_VAR_CATEGORICAL;
cv::Mat sampleIdx = cv::Mat();
cv::Mat missingdatamask = cv::Mat();
for (int i=0; i!=train_data.rows; ++i)
{
for (int j=0; j!=train_data.cols; ++j)
{
if(train_data.at<float>(i,j)<0
|| train_data.at<float>(i,j)>10000
|| !float(train_data.at<float>(i,j)))
{train_data.at<float>(i,j)=0;}
}
}
// Training
std::cout << "Training ....." << std::flush;
bool train = RFTree.train(train_data,
CV_ROW_SAMPLE,//tflag,
response_data,//responses,
varIdx,
sampleIdx,
vartype,
missingdatamask,
RFParams);
if (train){std::cout << " Done" << std::endl;}
else{std::cout << " Failed" << std::endl;return cv::Mat();}
std::cout << "Variable Importance : " << std::endl;
cv::Mat VI = RFTree.getVarImportance();
for (int i=0; i!=VI.cols; ++i){std::cout << VI.at<float>(i) << " - " << std::flush;}
std::cout << std::endl;
std::cout << "Predicting ....." << std::flush;
cv::Mat predict(1,sample_data.rows,CV_32F);
float max = 0;
for (int i=0; i!=sample_data.rows; ++i)
{
predict.at<float>(i) = RFTree.predict(sample_data.row(i));
if (predict.at<float>(i)>max){max=predict.at<float>(i);/*std::cout << predict.at<float>(i) << "-"<< std::flush;*/}
}
// Personnal test due to an error I got (everyone sent to 0)
if (max==0){std::cout << " Failed ... Max value = 0" << std::endl;return cv::Mat();}
std::cout << " Done ... Max value = " << max << std::endl;
return predict;
}