Question

我想根据CSV

中的Kaggle创建一个arff文件

https://www.kaggle.com/c/titanic/download/train.csv

这是我制作的arff文件的一部分

@relation titanic

@attribute PassengerId numeric
@attribute Survived {0,1}
@attribute Pclass {1,2,3}
@attribute Name string
@attribute Sex {male,female}
@attribute Age numeric
@attribute SibSp numeric
@attribute Parch numeric
@attribute Ticket string
@attribute Fare numeric
@attribute Cabin string
@attribute Embarked {C,Q,S}

@data
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S

但是当我在Weka中加载它时，它会返回我的错误：

nominal value not declared in header, read Token[C85], line 18 % the second line of my data

我的声明有什么问题？

Answer 1

问题是名称"Cumings, Mrs. John Bradley (Florence Briggs Thayer)"中有逗号。尽管有双引号，Weka将其解析为两个字段。

您可以尝试在正则表达式的帮助下删除此类逗号（即双引号内的逗号）。

手动从CSV创建ARFF文件

1 个答案: