分类概率向量总是看起来像(0,0,1,0)或(1,0,0,0) - 表示一个类别100%可能而其他类别为零。这很好,因为它们应该加起来为1,但我很好奇为什么它会在数千个测试示例中一直吐出100%的概率(我从未见过它会做任何事情)。
以下是我实施的要点:
OnlineLogisticRegression model = new OnlineLogisticRegression(4, 3, new L2());
Vector v = new RandomAccessSparseVector(3); //size representing # features
FeatureVectorEncoder feature1Encoder = new ContinuousValueEncoder("feature1Encoder");
//so on for the remaining two encoders (feature 2 and 3)
for(int i: predictors) { //where predictors is some array of continuous values representing the first feature
feature1Encoder.addToVector(null, i, v); //passing null for originalForm string (not necessary for continuous encoding)
}
//so on for the remaining two features
for(int i: targets) { //where targets is an int array that represents the different categories, achieved through interning via Dictionary
model.train(i, v); //train the model by passing the vector's actual classification. In the current application, it's possible for a vector to have multiple classifications. I thought this may be where the problem was, so I tried only training one category instead of looping through all of the categories the vector could be classified into, but to no avail.
}
Vector probabilities = model.classifyFull(someNewTestVector); //this vector will consistently look like (0, 1, 0, 0)