应用错误收集

WEKA：简单的逻辑预测理解

时间：2017-03-09 06:38:58

标签： machine-learning statistics weka logistic-regression

我对WEKA对训练模型的预测感到困惑：

ARFF架构

UITextField

训练有素的模特

@RELATION sportsArffWithEmpty

@ATTRIBUTE "annotation" {"DIFFERENT","SAME"}
@ATTRIBUTE "item_name" REAL
@ATTRIBUTE "brand" {"EMPTY","IGNORE","MATCHED","NOT_MATCH"}
@ATTRIBUTE "manufacturer" {"EMPTY","IGNORE","MATCHED","NOT_MATCH"}
@ATTRIBUTE "part_number" {"EMPTY","IGNORE","MATCHED","NOT_MATCH"}
@ATTRIBUTE "color" {"EMPTY","IGNORE","MATCHED","NOT_MATCH"}
@ATTRIBUTE "size" {"EMPTY","IGNORE","MATCHED","NOT_MATCH"}

实例

=== Run information ===

Scheme:       weka.classifiers.functions.SimpleLogistic -I 0 -M 500 -H 50 -W 0.0
Relation:     sportsArffWithEmpty
Instances:    263
Attributes:   7
              annotation
              item_name
              brand
              manufacturer
              part_number
              color
              size
Test mode:    user supplied test set:  size unknown (reading incrementally)

=== Classifier model (full training set) ===
SimpleLogistic:

Class 0 :
-2 +
[color=MATCHED] * 1.15 +
[size=IGNORE] * 1.03 +
[size=MATCHED] * -0.56 +
[size=NOT_MATCH] * 1.12

Class 1 :
2    +
[color=MATCHED] * -1.15 +
[size=IGNORE] * -1.03 +
[size=MATCHED] * 0.56 +
[size=NOT_MATCH] * -1.12

WEKA给出的预测

@DATA
"SAME","0.632","MATCHED","NOT_MATCH","MATCHED","MATCHED","MATCHED"

个人计算

=== Predictions on test set ===

inst#,actual,predicted,error,prediction
1,2:SAME,2:SAME,,0.945

因此输出应为The coefficient should be -2 + [color=MATCHED] * -1.15 + [size=IGNORE] * -1.03 + [size=MATCHED] * 0.56 + [size=NOT_MATCH] * -1.12 = -2+ 1*-1.15 + 0*-1.03 + 1* 0.56 + 0* -1.12 = 1.41 = 1/(1+ e^-1.41)，但WEKA给出的预测为0.8037

这里有什么问题？
为什么受过训练的模型不考虑0.945，item_name等？

回答问题1

根据WEKA forum，

brand

因此，问题1的答案计算为SimpleLogistic uses what’s called a symmetric model by Friedman et al. (2000), “Additive logistic regression: A statistical view of boosting”, Annals of Statistics 28(2). See page 354 in that paper.

Quesion

为什么训练有素的模型不考虑e^1.41/(e^1.41 + e^ (-1.41)) = 0.94，item_name等，是否与我的数据集偏差有关？

0 个答案:

没有答案