Question

我想通过Weka Explorer将breast-cancer-wisconsin中的数据作为C4.5数据文件加载，我在选择加载C4.5 .data和C4.5 .names时遇到以下错误： enter image description here

有什么想法吗？

Answer 1

C45 names file看起来不正确。尝试用这个替换breast-cancer-wisconsin.names：

2, 4.
clump: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
size: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
shape: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
adhesion: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
epithelial: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
nuclei: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
chromatin: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
nucleoli: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
mitoses: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

请注意，类首先出现（仅限标签）。

在这里，我使用

删除了原始数据集中第一列主题ID

$ cut -d, -f2-11 breast-cancer-wisconsin.data > breast-cancer-wisconsin.data

但要适应上述代码并不困难。

替代解决方案：

生成csv文件：您只需要向*.data文件添加标头，并将其重命名为*.csv。例如，将breast-cancer-wisconsin.data替换为breast-cancer-wisconsin.csv，其类似于
```
clump,size,shape,adhesion,epithelial,nuclei,chromatin,nucleoli,mitoses,class
5,1,1,1,2,1,3,1,1,2
5,4,4,5,7,10,3,2,1,2
3,1,1,1,2,2,3,1,1,2
6,8,8,1,3,4,3,7,1,2
...
```
手动构建*.arff文件;由于变量很少，这并不是很复杂。可以找到示例文件here。

Weka数据加载错误

1 个答案: