我有一个.txt格式的训练数据集,我将.txt文件转换为.arff文件。 这样做时,我松开了第一条记录,因为它将.txt文件中的第一条记录作为整个文件的属性。 (我使用的是.txt文件,它是制表符分隔格式)
这是我将文件转换为.arff的代码,有一种方法可以保留第一条记录,也可以作为整个文件的属性。
http://weka.wikispaces.com/Converting+CSV+to+ARFF
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;
import java.io.File;
public class CSV2Arff {
/**
* takes 2 arguments:
* - CSV input file
* - ARFF output file
*/
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("\nUsage: CSV2Arff <input.csv> <output.arff>\n");
System.exit(1);
}
// load CSV
CSVLoader loader = new CSVLoader();
loader.setSource(new File(args[0]));
Instances data = loader.getDataSet();
// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File(args[1]));
saver.setDestination(new File(args[1]));
saver.writeBatch();
}
}
这是一个非常大的训练数据集的前两个记录。
39 State-gov 77516 Bachelors 13 Never-married Adm-clerical Not-in-family White Male 2174 0 40 United-States <=50K
50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 0 0 13 United-States <=50K
运行代码生成.arff后,第一条记录被视为属性。因此考虑减少1个记录。我希望该记录在属性中,并将以下的训练数据设置为。
@relation training.txt
@attribute 39 numeric
@attribute ' State-gov' {' Self-emp-not-inc',' Private',' State-gov',' Federal-gov',' Local-gov',' ?',' Self-emp-inc',' Without-pay',' Never-worked'}
@attribute 77516 numeric
@attribute ' Bachelors' {' Bachelors',' HS-grad',' 11th',' Masters',' 9th',' Some-college',' Assoc-acdm',' Assoc-voc',' 7th-8th',' Doctorate',' Prof-school',' 5th-6th',' 10th',' 1st-4th',' Preschool',' 12th'}
@attribute 13 numeric
@attribute ' Never-married' {' Married-civ-spouse',' Divorced',' Married-spouse-absent',' Never-married',' Separated',' Married-AF-spouse',' Widowed'}
@attribute ' Adm-clerical' {' Exec-managerial',' Handlers-cleaners',' Prof-specialty',' Other-service',' Adm-clerical',' Sales',' Craft-repair',' Transport-moving',' Farming-fishing',' Machine-op-inspct',' Tech-support',' ?',' Protective-serv',' Armed-Forces',' Priv-house-serv'}
@attribute ' Not-in-family' {' Husband',' Not-in-family',' Wife',' Own-child',' Unmarried',' Other-relative'}
@attribute ' White' {' White',' Black',' Asian-Pac-Islander',' Amer-Indian-Eskimo',' Other'}
@attribute ' Male' {' Male',' Female'}
@attribute 2174 numeric
@attribute 0 numeric
@attribute 40 numeric
@attribute ' United-States' {' United-States',' Cuba',' Jamaica',' India',' ?',' Mexico',' South',' Puerto-Rico',' Honduras',' England',' Canada',' Germany',' Iran',' Philippines',' Italy',' Poland',' Columbia',' Cambodia',' Thailand',' Ecuador',' Laos',' Taiwan',' Haiti',' Portugal',' Dominican-Republic',' El-Salvador',' France',' Guatemala',' China',' Japan',' Yugoslavia',' Peru',' Outlying-US(Guam-USVI-etc)',' Scotland',' Trinadad&Tobago',' Greece',' Nicaragua',' Vietnam',' Hong',' Ireland',' Hungary',' Holand-Netherlands'}
@attribute ' <=50K' {' <=50K',' >50K'}
@data
50,' Self-emp-not-inc',83311,' Bachelors',13,' Married-civ-spouse',' Exec-managerial',' Husband',' White',' Male',0,0,13,' United-States',' <=50K'
感谢您的帮助。