如何在weka java API中使用SubsetByExpression过滤器?

时间:2017-05-06 08:51:32

标签: weka data-mining

我有我的java代码,它通过remove with values过滤器选择实例,但不选择特定的实例,例如:

RemoveWithValues filter = new RemoveWithValues();
String[] options = new String[4];
    options[0] = "-C";   // Choose attribute to be used for selection
    options[1] = "7"; // Attribute number    
    options[2] = "-S";   // Numeric value to be used for selection on numeric attribute. Instances with values smaller than given value will be selected. (default 0)
    options[3] = "17908";
    //200. Say you want all those instances whose values for this attribute are less than 200
    //get customer id
    try {
        DataSource source = new DataSource("data/customer_data.csv");
        Instances data = source.getDataSet();

        filter.setOptions(options);
        filter.setInputFormat(data);
        filter.setDontFilterAfterFirstBatch(false);
        Instances newData = RemoveWithValues.useFilter(data, filter);
        System.out.println("new data");
        System.out.println(newData);


    } catch (Exception e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

但是此代码不会选择属性7值为17908的实例。 我如何使用SubsetByExpression类?

提前致谢

1 个答案:

答案 0 :(得分:0)

试试这个

Instances dataset = source.getDataSet();
String[] options = new String[2];
options[0]="-E";
options[1]="(((ATT12= "+year+" )"+"and"+"(ATT13< "+day+" ))"+"or "+"(ATT12< "+year+" ))";
SubsetByExpression filter = new SubsetByExpression();
filter.setOptions(options);
filter.setInputFormat(dataset);
Instances testset = SubsetByExpression.useFilter(dataset, filter);
testset.setClassIndex(testset.numAttributes()-1);