.txt到.arff使用weka

时间:2016-03-16 20:12:39

标签: weka data-mining arff

我正在尝试在weka中使用这个数据集:

@relation adult

@attribute age: continuous

@attribute workclass: {Private,Self-emp-not-inc,Self-emp-inc,Federal-gov,Local-gov,State-gov,Without-pay,Never-worked}

@attribute fnlwgt: continuous.

@attribute education: {Bachelors,Some-college,11th,HS-grad,Prof-school,Assoc-acdm,Assoc-voc,9th,7th-8th,12th,Masters,1st-4th,10th,Doctorate,5th-6th,Preschool}

@attribute education-num: continuous

@attribute marital-status: {Married-civ-spouse,Divorced,Never-married,Separated,Widowed,Married-spouse-absent,Married-AF-spouse}

@attribute occupation: {Tech-support,Craft-repair,Other-service,Sales,Exec-managerial,Prof-specialty,Handlers-cleaners,Machine-op-inspct,Adm-clerical,Farming-fishing,Transport-moving,Priv-house-serv,Protective-serv,Armed-Forces.

@attribute relationship: {Wife,Own-child,Husband,Not-in-family,Other-relative,Unmarried}

@attribute race: {White,Asian-Pac-Islander,Amer-Indian-Eskimo,Other,Black}

@attribute sex: {Female,Male}

@attribute capital-gain: continuous

@attribute capital-loss: continuous

@attribute hours-per-week: continuous

@attribute native-country: {United-States,Cambodia,England,Puerto-Rico,Canada,Germany,Outlying-US(Guam-USVI-etc),India,Japan,Greece,South,China,Cuba,Iran,Honduras,Philippines,Italy,Poland,Jamaica,Vietnam,Mexico,Portugal,Ireland,France,Dominican-Republic,Laos,Ecuador,Taiwan,Haiti,Columbia,Hungary,Guatemala,Nicaragua,Scotland,Thailand,Yugoslavia,El-Salvador,Trinadad&Tobago,Peru,Hong,Holand-Netherlands}

@data

39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K

50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K

38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K

53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K

28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K

我一直收到错误:

无法将结构确定为arff(原因:java.io.IOException:关键字@relation expected,read Token ['{'],第1行)。

这没有任何意义,因为第1行没有“{”

1 个答案:

答案 0 :(得分:1)

有一些事情可能导致这个问题。以下是arff文件格式的规范。

arff file format specifications

在下面的数据集中,属性以以下格式列出:

@attribute 'fnlwgt' real

没有冒号和实数/整数而不是连续。

另外,你有

@attribute hours-per-week: continuous

@attribute native-country: {United-States,Cambodia,England,Puerto-Rico,Canada,Germany,Outlying-US(Guam-USVI-etc),India,Japan,Greece,South,China,Cuba,Iran,Honduras,Philippines,Italy,Poland,Jamaica,Vietnam,Mexico,Portugal,Ireland,France,Dominican-Republic,Laos,Ecuador,Taiwan,Haiti,Columbia,Hungary,Guatemala,Nicaragua,Scotland,Thailand,Yugoslavia,El-Salvador,Trinadad&Tobago,Peru,Hong,Holand-Netherlands}

在您的数据集中反转。

39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K

而且,你没有

 @attribute 'Class' {something, something2, something3}

来自seasr arff datasets

的vehicle.arff
@attribute 'COMPACTNESS' real
@attribute 'CIRCULARITY' real
@attribute 'DISTANCE CIRCULARITY' real
@attribute 'RADIUS RATIO' real
@attribute 'PR.AXIS ASPECT RATIO' real
@attribute 'MAX.LENGTH ASPECT RATIO' real
@attribute 'SCATTER RATIO' real
@attribute 'ELONGATEDNESS' real
@attribute 'PR.AXIS RECTANGULARITY' real
@attribute 'MAX.LENGTH RECTANGULARITY' real
@attribute 'SCALED VARIANCE_MAJOR' real
@attribute 'SCALED VARIANCE_MINOR' real
@attribute 'SCALED RADIUS OF GYRATION'  real
@attribute 'SKEWNESS ABOUT_MAJOR' real
@attribute 'SKEWNESS ABOUT_MINOR' real
@attribute 'KURTOSIS ABOUT_MAJOR' real
@attribute 'KURTOSIS ABOUT_MINOR' real
@attribute 'HOLLOWS RATIO' real
@attribute 'Class' {opel,saab,bus,van}

@data
95,48,83,178,72,10,162,42,20,159,176,379,184,70,6,16,187,197,van
91,41,84,141,57,9,149,45,19,143,170,330,158,72,9,14,189,199,van
104,50,106,209,66,10,207,32,23,158,223,635,220,73,14,9,188,196,saab
93,41,82,159,63,9,144,46,19,143,160,309,127,63,6,10,199,207,van
85,44,70,205,103,52,149,45,19,144,241,325,188,127,9,11,180,183,bus
107,57,106,172,50,6,255,26,28,169,280,957,264,85,5,9,181,183,bus
97,43,73,173,65,6,153,42,19,143,176,361,172,66,13,1,200,204,bus
90,43,66,157,65,9,137,48,18,146,162,281,164,67,3,3,193,202,van