Question

我正在尝试在Weka中使用LibSVM分类器来构建一个类SVM分类器。

我的培训文件包含名词列表。我的测试文件有很多单词。我的目标是使用分类器来预测测试文件中的名词。

我的输入arff文件（ ip.arff ）（培训文件）如下所示：

@relation test1

@attribute name string
@attribute class {yes}

@data
'building',yes
'car',yes
..... and so on

我的测试文件（ test.arff ）（测试文件）如下所示：

@relation test2

@attribute name string
@attribute class {yes}

@data
'car',?
'window',?
'running',?
..... and so on

这就是我所做的：

由于数据类型是字符串，因此我在两个输入文件上使用批量过滤来生成 ipstd.arff 和 teststd.arff 如上所述 http://weka.wikispaces.com/Batch+filtering
接下来，我使用 ipstd.arff 加载并运行分类器。（注意：所有单词都归类为是）
接下来，我加载测试集 teststd.arff 并重新评估模型。
但所有单词都被归类为名词（'是'）

=== Predictions on user test set ===

inst# actual predicted error prediction
```
  1        1:?      1:yes       1 
  2        1:?      1:yes       1
  3        1:?      1:yes       1
```
and so on

我的问题是测试文件中的所有单词（ teststd.arff ）都被归类为名词

有人能告诉我哪里出错了.. 我该怎么做才能将测试集中的名词单词分类为“是”，将其他单词分类为异常值。感谢...