Mahout的OnlineLogisticRegression分类器始终如一地产生100%的分类概率指数

时间:2015-06-05 01:27:15

标签: java classification regression mahout

分类概率向量总是看起来像(0,0,1,0)或(1,0,0,0) - 表示一个类别100%可能而其他类别为零。这很好,因为它们应该加起来为1,但我很好奇为什么它会在数千个测试示例中一直吐出100%的概率(我从未见过它会做任何事情)。

以下是我实施的要点:

OnlineLogisticRegression model = new OnlineLogisticRegression(4, 3, new L2());
Vector v = new RandomAccessSparseVector(3);  //size representing # features
FeatureVectorEncoder feature1Encoder = new ContinuousValueEncoder("feature1Encoder"); 
//so on for the remaining two encoders (feature 2 and 3)

for(int i: predictors) {  //where predictors is some array of continuous values representing the first feature
    feature1Encoder.addToVector(null, i, v); //passing null for originalForm string (not necessary for continuous encoding)
}
//so on for the remaining two features

for(int i: targets) {  //where targets is an int array that represents the different categories, achieved through interning via Dictionary
   model.train(i, v);  //train the model by passing the vector's actual classification. In the current application, it's possible for a vector to have multiple classifications. I thought this may be where the problem was, so I tried only training one category instead of looping through all of the categories the vector could be classified into, but to no avail.
}

Vector probabilities = model.classifyFull(someNewTestVector); //this vector will consistently look like (0, 1, 0, 0)

0 个答案:

没有答案