Question

我有一个单词及其频率的数据集。

我想过滤所有具有1个或多个属性的实例，其值为＆gt; 200（例如）。

我需要类似RemoveWithValues过滤器的内容，但我想将它用于所有属性，而不是仅用于一个属性。

我该怎么做？

注意：我使用的是Weka Explorer，我没有编写代码。

Answer 1

RemoveWithValues（）过滤器可以通过以下方式使用：

Instances data;
RemoveWithValues filter = new RemoveWithValues();

String[] options = new String[4];
options[0] = "-C";   // Choose attribute to be used for selection
options[1] = "1"; // Attribute number    
options[2] = "-S";   // Numeric value to be used for selection on numeric attribute. Instances with values smaller than given value will be selected. (default 0)
options[3] = "10";   //200. Say you want all those instances whose values for this attribute are less than 200
filter.setOptions(options);

filter.setInputFormat(data);
Instances newData = Filter.useFilter(data, filter);

因此，这是针对一个属性的。将它放在循环中，在每次迭代中更改选项[1]（迭代到所有属性的索引）。在此循环中，您必须使用newData替换数据。

Answer 2

在Weka Explorer中，可以使用RemoveWithValues过滤器，如下所示：

输入属性索引作为需要过滤的第一个元素
如果您希望仅保留200以下的记录，请反转选择
输入分割点200.0
应用这些更改，然后根据需要调整属性索引。

Remove With Values Image

Weka无法使用此选项，您可以使用存储数据的电子表格/数据库工具预处理数据。

在Weka中过滤具有高属性值的所有实例

2 个答案: