我正在开展数据挖掘的学校项目,我们从kaggle获得了CSV数据(这是数据的外观(6970中的2行)):
4,1970,Female,150,DomesticPartnersKids,Bachelor's Degree,Democrat,,Yes,No,No,No,Yes,Public,No,Yes,No,Yes,No,No,Yes,Science,Study first,Yes,Yes,No,No,Receiving,No,No,Pragmatist,No,No,Cool headed,Standard hours,No,Happy,Yes,Yes,Yes,No,A.M.,No,End,Yes,No,Me,Yes,Yes,No,Yes,No,Mysterious,No,No,,,,,,,,,,Mac,Yes,Cautious,No,Umm...,No,Space,Yes,In-person,No,Yes,Yes,No,Yay people!,Yes,Yes,Yes,Yes,Yes,No,Yes,,,,,,,,,,,,,,,,,No,No,No,Only-child,Yes,No,No
5,1997,Male,75,Single,High School Diploma,Republican,,Yes,Yes,No,,Yes,Private,No,No,No,Yes,No,No,Yes,Science,Study first,,Yes,No,Yes,Receiving,No,Yes,Pragmatist,No,Yes,Cool headed,Odd hours,No,Right,Yes,No,No,Yes,A.M.,Yes,Start,Yes,Yes,Circumstances,No,Yes,No,Yes,Yes,Mysterious,No,No,Tunes,Technology,Yes,Yes,Yes,Yes,No,Supportive,No,PC,No,Cautious,No,Umm...,No,Space,No,In-person,No,No,Yes,Yes,Grrr people,Yes,No,No,No,No,No,No,Yes,No,No,Yes,No,Own,Pessimist,Mom,No,No,No,No,Nope,Yes,No,No,No,Yes,No,Yes,No,Yes,No
我们必须将其转换为.arff格式才能在weka中使用。我manualy键入标题(107属性)
@ATTRIBUTE user_id NUMERIC
@ATTRIBUTE yob NUMERIC
@ATTRIBUTE gender {Male,Female}
@ATTRIBUTE income {150,100,75,50,25,10}
@ATTRIBUTE householdstatus {MarriedKids,Married,DomesticPartnersKids,DomesticPartners,Single,SingleKids}
@ATTRIBUTE educationlevel {Bachelor's Degree,High School Diploma,Current K-12,Current Undergraduate,Master's Degree,Associate's Degree,Doctoral Degree}
@ATTRIBUTE party {Democrat,Republican}
@ATTRIBUTE Q124742 {Yes,No}
@ATTRIBUTE Q124122 {Yes,No}
我收到此错误:
}在枚举结束时预期读取令牌eol
然后我尝试使用weka转换器,但它给了我一个错误
错误的数量值。读取2,预期1,读取令牌[EOL],第4行遇到问题:3
答案 0 :(得分:1)
这就是我的所作所为: 从Kaggle,我下载了train.csv(5568个实例,最高ID号码6960)。
我没有使用转换器 - 只是将其作为CSV文件加载到Weka Explorer中。一些问题及其解决方案:
将此保存为train.arff
重新装入,似乎工作正常。我使用OneR分类器生成了51%的准确度,但您不希望OneR分类器在这里运行良好。我相信你能做得更好。
注意我没有手动输入标题。那肯定需要一段时间!
祝你好运!