Question

考虑以下虚构的arff文件：

@relation referents
@attribute feature1      NUMERIC
@attribute feature2      NUMERIC
@attribute feature3      NUMERIC
@attribute feature4      NUMERIC
@attribute class{WIN,LOSS}
@data
1, 7, 1, 0, WIN
1, 5, 1, 0, WIN
-1, 1, 1, 0, LOSS
1, 1, 1, 1, WIN
-1, 1, 1, 1, WIN
1, 7, 1, 0, WIN
1, 5, 1, 0, WIN
-1, 1, 1, 0, LOSS
1, 1, 1, 1, WIN
-1, 1, 1, 1, WIN

使用WEKA 3-8，在资源管理器中打开上述ARFF。单击“分类”。选择J48分类器，保留所有默认设置。在“测试选项”下，选择“百分比分割= 50％” 单击“更多选项”，选择“输出预测 - > CSV;

”

点击开始

您将看到以下输出：

=== Run information ===

Scheme:       weka.classifiers.trees.J48 -C 0.25 -M 2
Relation:     referents
Instances:    10
Attributes:   5
              feature1
              feature2
              feature3
              feature4
              class
Test mode:    split 50.0% train, remainder test

=== Classifier model (full training set) ===

J48 pruned tree
------------------

feature1 <= -1
|   feature4 <= 0: LOSS (2.0)
|   feature4 > 0: WIN (2.0)
feature1 > -1: WIN (6.0)

Number of Leaves  :     3

Size of the tree :  5


Time taken to build model: 0 seconds

=== Predictions on test split ===

inst#,actual,predicted,error,prediction
1,2:LOSS,1:WIN,+,0.8
2,1:WIN,1:WIN,,0.8
3,1:WIN,1:WIN,,0.8
4,1:WIN,1:WIN,,0.8
5,1:WIN,1:WIN,,0.8

//跳过报告的其余部分......

观察输入arff文件中的最后五个实例的顺序为

WIN 赢得失利赢得 WIN

但是，实际输出'对测试分割的预测'的顺序如下：失利赢得赢得赢得 WIN

为什么这些顺序不一样，而且，如何在“测试拆分预测”中的inst＃和arff文件中的@data实例之间建立连接？

Answer 1

当weka将您的数据拆分为train并进行测试时，它会随机生成，这意味着weka会从您的arff中随机选择实例（您也可以指定应用的随机数）。这就是为什么订单与最后5个实例不同的原因。

weka'对测试拆分的预测'未按数据顺序列出

1 个答案: