在处理libsvm时,我遇到了一个非常具有挑战性的问题。当我在libsvm中测试我的数据时,准确性是荒谬的(1%)。我不知道它是否正常准确或者我做错了什么,但是当我执行easy.py脚本时,当svm-scale脚本执行时,会出现以下警告几次。
WARNING: feature index i appeared in test.libsvm was not seen in the scaling factor file train.libsvm.range.
如何修复此警告?修复会提高我的准确性吗?
编辑:train.libsvm.range的内容如下:
x
-1 1
2 -1 0
3 -1 0
4 1 2
5 -1 0
6 -1 0
7 -1 0
8 0 1
9 0 1
10 -1 0
11 0 1
12 2 3
13 -1 0
14 -1 0
15 -1 0
16 0 2
17 -1 0
18 -2 0
19 -2 0
20 0 1
21 0 2
23 0 1
24 2 3
25 0 1
26 -1 0
27 -1 0
28 1 2
29 -1 0
30 -1 0
31 -1 0
32 0 2
36 0 1
编辑:您可以在此处看到The training file和Testing file
答案 0 :(得分:1)
这是因为您的测试数据中存在一些不在用于生成缩放文件的训练数据中的特征。检查您的培训和测试数据集是否匹配。如果您的测试数据(或训练数据)来自错误的数据集,那么获取正确的数据可能会解决问题
例如,在您的污染数据文件中,功能编号3始终为零,因此不会包含在测试文件的行5626中。
-1 1:0 2:-1 3:-1 4:2 5:-1 6:-1 7:-1 8:0 9:0 10:0 11:0 12:2 13:0 14:-1
由于功能3在测试文件中有一个值但不在缩放因子文件中,因此您会收到错误消息。
我不确定train.libsvm.range的内容来自您发布的内容,因为如果我从测试日期生成它,我会得到:
x
-1 1
2 -1 0
4 0 2 ** note 3 is missing **
5 -1 0
6 -1 0
7 -1 0
8 0 1
9 0 1
12 0 3 ** note 10, 11 are missing **
etc.
检查您是否使用了正确的测试和培训数据。
另外一件事,运行easy.py我得到65%的准确率而不是1%:
$ ./easy.py train_libsvm.mht test_sdx.mht
Scaling training data...
WARNING: original #nonzeros 7560
new #nonzeros 15748
Use -l 0 if many original feature values are zeros
Cross validation...
Best c=512.0, g=0.0001220703125 CV rate=70.0
Training...
Output model: train_libsvm.mht.model
Scaling testing data...
WARNING: feature index 3 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
WARNING: feature index 10 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
WARNING: feature index 11 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
WARNING: feature index 13 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
WARNING: feature index 22 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
WARNING: feature index 25 appeared in file test_sdx.mht was not seen in the scaling factor file train_libsvm.mht.range.
WARNING: original #nonzeros 67740
new #nonzeros 169332
Use -l 0 if many original feature values are zeros
Testing...
Accuracy = 65.651% (3706/5645) (classification)