我对WEKA对训练模型的预测感到困惑:
UITextField
@RELATION sportsArffWithEmpty
@ATTRIBUTE "annotation" {"DIFFERENT","SAME"}
@ATTRIBUTE "item_name" REAL
@ATTRIBUTE "brand" {"EMPTY","IGNORE","MATCHED","NOT_MATCH"}
@ATTRIBUTE "manufacturer" {"EMPTY","IGNORE","MATCHED","NOT_MATCH"}
@ATTRIBUTE "part_number" {"EMPTY","IGNORE","MATCHED","NOT_MATCH"}
@ATTRIBUTE "color" {"EMPTY","IGNORE","MATCHED","NOT_MATCH"}
@ATTRIBUTE "size" {"EMPTY","IGNORE","MATCHED","NOT_MATCH"}
=== Run information ===
Scheme: weka.classifiers.functions.SimpleLogistic -I 0 -M 500 -H 50 -W 0.0
Relation: sportsArffWithEmpty
Instances: 263
Attributes: 7
annotation
item_name
brand
manufacturer
part_number
color
size
Test mode: user supplied test set: size unknown (reading incrementally)
=== Classifier model (full training set) ===
SimpleLogistic:
Class 0 :
-2 +
[color=MATCHED] * 1.15 +
[size=IGNORE] * 1.03 +
[size=MATCHED] * -0.56 +
[size=NOT_MATCH] * 1.12
Class 1 :
2 +
[color=MATCHED] * -1.15 +
[size=IGNORE] * -1.03 +
[size=MATCHED] * 0.56 +
[size=NOT_MATCH] * -1.12
WEKA给出的预测
@DATA
"SAME","0.632","MATCHED","NOT_MATCH","MATCHED","MATCHED","MATCHED"
=== Predictions on test set ===
inst#,actual,predicted,error,prediction
1,2:SAME,2:SAME,,0.945
因此输出应为The coefficient should be
-2 +
[color=MATCHED] * -1.15 +
[size=IGNORE] * -1.03 +
[size=MATCHED] * 0.56 +
[size=NOT_MATCH] * -1.12
= -2+ 1*-1.15 + 0*-1.03 + 1* 0.56 + 0* -1.12
= 1.41
= 1/(1+ e^-1.41)
,但WEKA给出的预测为0.8037
0.945
,item_name
等?根据WEKA forum,
brand
因此,问题1的答案计算为SimpleLogistic uses what’s called a symmetric model by Friedman et al.
(2000), “Additive logistic regression: A statistical view of boosting”,
Annals of Statistics 28(2). See page 354 in that paper.
为什么训练有素的模型不考虑e^1.41/(e^1.41 + e^ (-1.41)) = 0.94
,item_name
等,是否与我的数据集偏差有关?