Java和Weka:创建字符串属性

时间:2018-01-31 22:38:51

标签: java string weka arff

我正在尝试创建一个Java程序,将文本文件转换为Weka的ARFF文件。不知何故,我的名字属性设置为数字,但应设置为字符串。我尝试了一切,我试着修理它固定

attr.add(new Attribute("name"));

attr.add(new Attribute("name",true));

但是当我运行它时,它会将名称打印为数字(位于第2列)

1,0,?,?,?
1000,1,?,?,?
1002,2,?,?,?
2,3,?,?,?
3000,4,?,?,?

我做错了什么?

import java.util.ArrayList;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.*;
import weka.core.Attribute;
import weka.core.DenseInstance;
import weka.core.Instance;
import java.util.*;
import weka.core.Instances;
import weka.core.converters.ArffSaver; 

public class WekaCreateARFF {

    private static final String FILENAME = "Some File";
    public static void main(String[] args) throws IOException {
        ArrayList<String> input = new ArrayList<String>();
        ArrayList<Attribute> attr = new ArrayList<Attribute>();
        Instances dataset;
        double [] values;
        BufferedReader br = null;
        FileReader fr = null;
        String date = null;
        double id;
        String n = null;

        Instance inst = new DenseInstance(5); 

        List nominal_state = new ArrayList(5);
        nominal_state.add("CA");
        nominal_state.add("NC");
        nominal_state.add("TX");
        nominal_state.add("SC");
        nominal_state.add("NY");

        List nominal_party = new ArrayList(2);
        nominal_party.add("republican");
    nominal_party.add("democrat");

    attr.add(new Attribute("id"));
    attr.add(new Attribute("name",true));
    attr.add(new Attribute("political party", nominal_party));
    attr.add(new Attribute("state", nominal_state));
    attr.add(new Attribute("birth date", date));



    try {

        fr = new FileReader(FILENAME);
        br = new BufferedReader(fr);

        String entry;

        dataset = new Instances("SimpleARFF",attr,0);
        values = new double[dataset.numAttributes()];
        while ((entry = br.readLine()) != null) {

        //System.out.println(entry);
            input.add(entry);
            for (int i = 0; i<5; i++ ) {
                String[] parts = entry.split(",");
                String part1 = parts[0];
                String name = parts[1];                 
                id = Double.parseDouble(part1);



                inst.setValue(attr.get(0), id);
                                    inst.setValue(attr.get(1), name);


            }
            System.out.println(inst);

            dataset.add(new DenseInstance(1.0, values));
        }


        //System.out.println(dataset);
        //ArffSaver arff = new ArffSaver();
        //arff.setInstances(dataset);
        //arff.setFile(new File("Simple.arff"));
        //arff.writeBatch();


    } catch (IOException e) {

        e.printStackTrace();

    } finally {

        try {

            if (br != null)
                br.close();

            if (fr != null)
                fr.close();

        } catch (IOException ex) {

            ex.printStackTrace();
        }
    }

}

}

1 个答案:

答案 0 :(得分:1)

你可能想要这个构造函数:

http://weka.sourceforge.net/doc.dev/weka/core/Attribute.html#Attribute-java.lang.String-boolean-

也就是说,你基本上必须添加一个布尔标志来告诉Weka你想要一个String属性,而不是一个数字属性(默认):

new Attribute("blah", true)

应该为您提供String - 属性。