Question

我有一个包含大量阿拉伯语文本的.txt文件，我想自动将此文件转换为.arff文件，因此我可以在Weka中使用它来获取规则。

正如我的教授要求我需要30个属性，每个属性应该包含文本文件中的所有单词，每行数据将包含真实的句子，但是使用,分隔成单词并且如果句子包括少于30个单词，其余部分将填充?。

arff文件应如下所示：

@relation RelName

@attribute 'x1'{*will include all words in the text file*}
@attribute 'x2'{*will include all words in the text file*}
.
.
.
@attribute 'x30'{*will include all words in the text file*}

@data
Wordx,Wordy,Wordz,Wordq,Wordw,?,?,?,?,?...................,? //till 30 word
.
.
.
.

等等

那么无论如何都要自动从单个.txt文件生成这种格式的.arff文件？谢谢你的帮助

Answer 1

你可以使用arff 0.9。适用于python 2.x和3.x。

EG：

导入arff data = [[1,2,3]，[10,20,30]] arff.dump（'result.arff'，data，relation =“test”，names = ['one'，'two'，'three']）

该命令将创建一个具有三个属性“one”，“two”和“three”的关系测试。第一列将包含1,10。第二栏包含2,20。第三栏包含3,30。

自动将单个txt文件转换为arff文件

1 个答案: