如何将pandas dataframe转换为libsvm格式?

时间:2017-04-25 09:59:56

标签: pandas dataframe libsvm

我有像下面这样的pandas数据框。

df
Out[50]: 
    0   1   2   3   4   5   6   7   8   9  ...  90  91  92  93  94  95  96  97 \
0   0   0   0   0   0   0   0   0   0   0 ...   1   1   1   1   1   1   1   1   
1   0   1   1   1   0   0   1   1   1   1 ...   0   0   0   0   0   0   0   0   
2   1   1   1   1   1   1   1   1   1   1 ...   0   0   0   0   0   0   0   0   
3   0   0   0   0   0   0   0   0   0   0 ...   1   1   1   1   1   1   1   1   
4   0   0   0   0   0   0   0   0   0   0 ...   1   1   1   1   1   1   1   1   
5   1   0   0   1   1   1   1   0   0   0 ...   0   0   0   0   0   0   0   0   
6   0   0   0   0   0   0   0   0   0   0 ...   1   1   1   1   1   1   1   1   
7   0   0   0   0   0   0   0   0   0   0 ...   1   1   1   1   1   1   1   1   

[8 rows x 100 columns]

我将目标变量作为数组,如下所示。

[1, -1, -1, 1, 1, -1, 1, 1]

如何将此目标变量映射到数据框并将其转换为lib SVM格式?

equi = {0:1, 1:-1, 2:-1,3:1,4:1,5:-1,6:1,7:1}
df["labels"] = df.index.map[(equi)]
d = df[np.setdiff1d(df.columns,['indx','labels'])]
e = df.label
dump_svmlight_file(d,e,'D:/result/smvlight2.dat')er code here

ERROR:

 File "D:/spyder/april.py", line 54, in <module>
df["labels"] = df.index.map[(equi)]

TypeError: 'method' object is not subscriptable

当我使用

df["labels"] = df.index.list(map[(equi)])

ERROR:

AttributeError: 'RangeIndex' object has no attribute 'list'

请帮我解决这些错误。

1 个答案:

答案 0 :(得分:2)

我认为您需要转换index to_series,然后致电map

df["labels"] = df.index.to_series().map(equi)

或使用index的{​​{3}}:

df["labels"] = df.rename(index=equi).index

所有在一起:

对于列的差异,pandas有rename

from sklearn.datasets import dump_svmlight_file

equi = {0:1, 1:-1, 2:-1,3:1,4:1,5:-1,6:1,7:1}

df["labels"] = df.rename(index=equi).index
e = df["labels"]
d = df[df.columns.difference(['indx','labels'])]

dump_svmlight_file(d,e,'C:/result/smvlight2.dat')

似乎label列似乎没有必要:

from sklearn.datasets import dump_svmlight_file

equi = {0:1, 1:-1, 2:-1,3:1,4:1,5:-1,6:1,7:1}
e = df.rename(index=equi).index
d = df[df.columns.difference(['indx'])]
dump_svmlight_file(d,e,'C:/result/smvlight2.dat')