Question

分类概率向量总是看起来像（0,0,1,0）或（1,0,0,0） - 表示一个类别100％可能而其他类别为零。这很好，因为它们应该加起来为1，但我很好奇为什么它会在数千个测试示例中一直吐出100％的概率（我从未见过它会做任何事情）。

以下是我实施的要点：

OnlineLogisticRegression model = new OnlineLogisticRegression(4, 3, new L2());
Vector v = new RandomAccessSparseVector(3);  //size representing # features
FeatureVectorEncoder feature1Encoder = new ContinuousValueEncoder("feature1Encoder"); 
//so on for the remaining two encoders (feature 2 and 3)

for(int i: predictors) {  //where predictors is some array of continuous values representing the first feature
    feature1Encoder.addToVector(null, i, v); //passing null for originalForm string (not necessary for continuous encoding)
}
//so on for the remaining two features

for(int i: targets) {  //where targets is an int array that represents the different categories, achieved through interning via Dictionary
   model.train(i, v);  //train the model by passing the vector's actual classification. In the current application, it's possible for a vector to have multiple classifications. I thought this may be where the problem was, so I tried only training one category instead of looping through all of the categories the vector could be classified into, but to no avail.
}

Vector probabilities = model.classifyFull(someNewTestVector); //this vector will consistently look like (0, 1, 0, 0)

Mahout的OnlineLogisticRegression分类器始终如一地产生100％的分类概率指数

0 个答案: