arff文件在python中创建的arff文件中的标称属性

时间:2013-04-28 12:01:16

标签: python arff

Python中arff库中的dump命令使用户能够根据给定的输入创建一个arff文件,例如:命令:

arff.dump("outputDir", data, relation="relation1",
          names=['age, fatRatio, hairColor'])

产生以下arff:

@relation relation1
@attribute age real
@attribute hairColor string
@data
10,0.2,black
22,10,yellow
30,2,black

给出的数据:

data = [[10,0.2,'black'],[22,10,'yellow'],[30,2,'black']]

我的问题是:如何通知相关机制我希望hairColor成为名义属性,即我希望我的arff标题如下:

@relation relation1
@attribute age real
@attribute hairColor **nominal**
@data
...

1 个答案:

答案 0 :(得分:0)

这里概述了几种不同的方法:

https://code.google.com/p/arff/wiki/Documentation

我认为对我来说更好的方法是推荐这个的第二个方法:

arff_writer = arff.Writer(fname, relation='diabetics_data', names)
arff_writer.pytypes[arff.nominal] = '{not_parasite,parasite}'
arff_writer.write([arff.nominal('parasite')])

如果你看一下arff.nominal的代码,它的定义如下:

class Nominal(str):
    """Use this class to wrap strings which are intended to be nominals
    and shouldn't have enclosing quote signs."""
    def __repr__(self):
        return self

所以我所做的就是在我的属性中为每个名义创建一个不同的“包装”标称类,如下所示:

class ZipCode(str):
    """Use this class to wrap strings which are intended to be nominals
    and shouldn't have enclosing quote signs."""
    def __repr__(self):
        return self

然后按照上面的代码,您可以执行以下操作:

arff_writer = arff.Writer(fname, relation='neighborhood_data', names)
arff_writer.pytypes[type(myZipCodeObject)] = '{85104,84095}'
# then write out the rest of your attributes...

arff_writer.write([arff.nominal('parasite')])