在weka上使用j48进行分类

时间:2013-10-24 19:29:51

标签: machine-learning classification weka

我将这些数据作为训练集并将属性PlayTennise作为目标。

@relation Weka

@attribute Day {D1,D2,D3,D4,D5,D6,D7,D8,D9,D10,D11,D12,D13,D14}
@attribute Outlook {Sunny,Overcast,Rain}
@attribute Temperature {Hot,Mild,Cool}
@attribute Humidity {High,Normal}
@attribute Wind {Weak,Strong}
@attribute PlayTennis {No,Yes}

@data
D1,Sunny,Hot,High,Weak,No
D2,Sunny,Hot,High,Strong,No
D3,Overcast,Hot,High,Weak,Yes
D4,Rain,Mild,High,Weak,Yes
D5,Rain,Cool,Normal,Weak,Yes
D6,Rain,Cool,Normal,Strong,No
D7,Overcast,Cool,Normal,Strong,Yes
D8,Sunny,Mild,High,Weak,No
D9,Sunny,Cool,Normal,Weak,Yes
D10,Rain,Mild,Normal,Weak,Yes
D11,Sunny,Mild,Normal,Strong,Yes
D12,Overcast,Mild,High,Strong,Yes
D13,Overcast,Hot,Normal,Weak,Yes
D14,Rain,Mild,High,Strong,No

另外,我给weka提供了所提供的测试集的数据,但只是将目标[是,否]转换为'?'。 这样:

@relation Weka2

@attribute Day {D1,D2,D3,D4,D5,D6,D7,D8,D9,D10,D11,D12,D13,D14}
@attribute Outlook {Sunny,Overcast,Rain}
@attribute Temperature {Hot,Mild,Cool}
@attribute Humidity {High,Normal}
@attribute Wind {Weak,Strong}
@attribute PlayTennis {No,Yes}

@data
D1,Sunny,Hot,High,Weak,?
D2,Sunny,Hot,High,Strong,?
D3,Overcast,Hot,High,Weak,?
D4,Rain,Mild,High,Weak,?
D5,Rain,Cool,Normal,Weak,?
D6,Rain,Cool,Normal,Strong,?
D7,Overcast,Cool,Normal,Strong,?
D8,Sunny,Mild,High,Weak,?
D9,Sunny,Cool,Normal,Weak,?
D10,Rain,Mild,Normal,Weak,?
D11,Sunny,Mild,Normal,Strong,?
D12,Overcast,Mild,High,Strong,?
D13,Overcast,Hot,Normal,Weak,?
D14,Rain,Mild,High,Strong,?

点击开始但结果说明了这一点:

=== Run information ===

Scheme:       weka.classifiers.trees.J48 -C 0.25 -M 2
Relation:     Weka
Instances:    14
Attributes:   6
              Day
              Outlook
              Temperature
              Humidity
              Wind
              PlayTennis
Test mode:    user supplied test set:  size unknown     (reading incrementally)

=== Classifier model (full training set) ===

J48 pruned tree
------------------

Outlook = Sunny
|   Humidity = High: No (3.0)
|   Humidity = Normal: Yes (2.0)
Outlook = Overcast: Yes (4.0)
Outlook = Rain
|   Wind = Weak: Yes (3.0)
|   Wind = Strong: No (2.0)

Number of Leaves  :     5

Size of the tree :  8


Time taken to build model: 0 seconds

=== Evaluation on test set ===

Time taken to test model on supplied test set: 0 seconds

=== Summary ===

Total Number of Instances                0     
Ignored Class Unknown Instances                  7     

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 0.000    0.000    0.000      0.000    0.000      0.000    ?         ?         No
                 0.000    0.000    0.000      0.000    0.000      0.000    ?         ?         Yes
Weighted Avg.    NaN      NaN      NaN        NaN      NaN        NaN      NaN       NaN       

=== Confusion Matrix ===

 a b   <-- classified as
 0 0 | a = No
 0 0 | b = Yes

它说“忽略类未知实例= 14”和“实例总数= 0”

我不明白我该怎么做?

请帮帮我?

1 个答案:

答案 0 :(得分:1)

测试数据集应保留为标记为“是”或“否”的目标变量。

这将允许Weka评估其预测的质量。如果没有目标标签,Weka不知道预测是否正确,因此在评估中忽略了这些情况。

如果您只是对预测感兴趣,您仍然可以使用未标记的数据。

  

例如,如果使用GUI:

     
      
  1. 加载训练数据并选择“分类”选项卡。
  2.   
  3. 按“测试选项”框中的“更多选项”按钮。
  4.   
  5. 现在在“输出预测”旁边放置一个复选标记。
  6.   
  7. 提供未标准的测试数据并按下“开始”按钮
  8.   

这会产生一个ouptut,其中包含对看似被忽略的实例的预测(下面是相关输出的示例)。

=== Predictions on test split ===  
inst#,    actual, predicted, error, probability distribution
     1          ?       2:no      +   0     *1    
     2          ?       2:no      +   0     *1    
     3          ?      1:yes      +  *1      0    
     4          ?      1:yes      +  *1      0    
     5          ?      1:yes      +  *1      0    
     6          ?       2:no      +   0     *1    
     7          ?      1:yes      +  *1      0