我将这些数据作为训练集并将属性PlayTennise作为目标。
@relation Weka
@attribute Day {D1,D2,D3,D4,D5,D6,D7,D8,D9,D10,D11,D12,D13,D14}
@attribute Outlook {Sunny,Overcast,Rain}
@attribute Temperature {Hot,Mild,Cool}
@attribute Humidity {High,Normal}
@attribute Wind {Weak,Strong}
@attribute PlayTennis {No,Yes}
@data
D1,Sunny,Hot,High,Weak,No
D2,Sunny,Hot,High,Strong,No
D3,Overcast,Hot,High,Weak,Yes
D4,Rain,Mild,High,Weak,Yes
D5,Rain,Cool,Normal,Weak,Yes
D6,Rain,Cool,Normal,Strong,No
D7,Overcast,Cool,Normal,Strong,Yes
D8,Sunny,Mild,High,Weak,No
D9,Sunny,Cool,Normal,Weak,Yes
D10,Rain,Mild,Normal,Weak,Yes
D11,Sunny,Mild,Normal,Strong,Yes
D12,Overcast,Mild,High,Strong,Yes
D13,Overcast,Hot,Normal,Weak,Yes
D14,Rain,Mild,High,Strong,No
另外,我给weka提供了所提供的测试集的数据,但只是将目标[是,否]转换为'?'。 这样:
@relation Weka2
@attribute Day {D1,D2,D3,D4,D5,D6,D7,D8,D9,D10,D11,D12,D13,D14}
@attribute Outlook {Sunny,Overcast,Rain}
@attribute Temperature {Hot,Mild,Cool}
@attribute Humidity {High,Normal}
@attribute Wind {Weak,Strong}
@attribute PlayTennis {No,Yes}
@data
D1,Sunny,Hot,High,Weak,?
D2,Sunny,Hot,High,Strong,?
D3,Overcast,Hot,High,Weak,?
D4,Rain,Mild,High,Weak,?
D5,Rain,Cool,Normal,Weak,?
D6,Rain,Cool,Normal,Strong,?
D7,Overcast,Cool,Normal,Strong,?
D8,Sunny,Mild,High,Weak,?
D9,Sunny,Cool,Normal,Weak,?
D10,Rain,Mild,Normal,Weak,?
D11,Sunny,Mild,Normal,Strong,?
D12,Overcast,Mild,High,Strong,?
D13,Overcast,Hot,Normal,Weak,?
D14,Rain,Mild,High,Strong,?
点击开始但结果说明了这一点:
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: Weka
Instances: 14
Attributes: 6
Day
Outlook
Temperature
Humidity
Wind
PlayTennis
Test mode: user supplied test set: size unknown (reading incrementally)
=== Classifier model (full training set) ===
J48 pruned tree
------------------
Outlook = Sunny
| Humidity = High: No (3.0)
| Humidity = Normal: Yes (2.0)
Outlook = Overcast: Yes (4.0)
Outlook = Rain
| Wind = Weak: Yes (3.0)
| Wind = Strong: No (2.0)
Number of Leaves : 5
Size of the tree : 8
Time taken to build model: 0 seconds
=== Evaluation on test set ===
Time taken to test model on supplied test set: 0 seconds
=== Summary ===
Total Number of Instances 0
Ignored Class Unknown Instances 7
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.000 0.000 0.000 0.000 0.000 0.000 ? ? No
0.000 0.000 0.000 0.000 0.000 0.000 ? ? Yes
Weighted Avg. NaN NaN NaN NaN NaN NaN NaN NaN
=== Confusion Matrix ===
a b <-- classified as
0 0 | a = No
0 0 | b = Yes
它说“忽略类未知实例= 14”和“实例总数= 0”
我不明白我该怎么做?
请帮帮我?
答案 0 :(得分:1)
测试数据集应保留为标记为“是”或“否”的目标变量。
这将允许Weka评估其预测的质量。如果没有目标标签,Weka不知道预测是否正确,因此在评估中忽略了这些情况。
如果您只是对预测感兴趣,您仍然可以使用未标记的数据。
例如,如果使用GUI:
- 加载训练数据并选择“分类”选项卡。
- 按“测试选项”框中的“更多选项”按钮。
- 现在在“输出预测”旁边放置一个复选标记。
- 提供未标准的测试数据并按下“开始”按钮
醇>
这会产生一个ouptut,其中包含对看似被忽略的实例的预测(下面是相关输出的示例)。
=== Predictions on test split === inst#, actual, predicted, error, probability distribution 1 ? 2:no + 0 *1 2 ? 2:no + 0 *1 3 ? 1:yes + *1 0 4 ? 1:yes + *1 0 5 ? 1:yes + *1 0 6 ? 2:no + 0 *1 7 ? 1:yes + *1 0