我尝试使用10倍CV在我的数据集中运行kNN分类器。我对WEKA的模型有一些经验,但很难将其转移到Sklearn。
以下是我的代码
sub next_level {
my ($user) = @_;
$user =~ s/(\d+)\z/ $1 + 1 /e;
return $user;
}
我收到此错误
filename = 'train4.csv'
names = ['attribute names are here']
df = pandas.read_csv(filename, names=names)
num_folds = 10
kfold = KFold(n_splits=10, random_state=7)
model = KNeighborsClassifier()
results = cross_val_score(model, df.drop('mix1_instrument', axis=1), df['mix1_instrument'], cv=kfold)
print(results.mean())
如何转换此属性?这包含用于对我的实例进行分类的有用信息,转换会对此产生影响吗?
有两个属性是' object'我相信需要转换命名的' class1'和class2'
以下示例数据......
ValueError: could not convert string to float: ''
谢谢
答案 0 :(得分:4)
这是一个小型演示:
来源DF:
In [43]: df
Out[43]:
Energy HamoPkStd class1 class2 mix1_instrument
0 -2.961480 14.391206 aerophone aero_single-reed Saxophone
1 -3.522993 20.306125 chordophone aero_lip-vibrated Trumpet
2 -3.409359 9.727358 aerophone chrd_simple Piano
标签编码:
In [44]: %paste
from sklearn.preprocessing import LabelBinarizer, LabelEncoder
str_cols = df.columns[df.columns.str.contains('(?:class|instrument)')]
clfs = {c:LabelEncoder() for c in str_cols}
for col, clf in clfs.items():
df[col] = clfs[col].fit_transform(df[col])
## -- End pasted text --
结果 - 所有文本/字符串列都已转换为数字,因此我们可以将其提供给神经网络:
In [45]: df
Out[45]:
Energy HamoPkStd class1 class2 mix1_instrument
0 -2.961480 14.391206 0 1 1
1 -3.522993 20.306125 1 0 2
2 -3.409359 9.727358 0 2 0
反向转换:
In [48]: clfs['class1'].inverse_transform(df['class1'])
Out[48]: array(['aerophone', 'chordophone', 'aerophone'], dtype=object)
In [49]: clfs['mix1_instrument'].inverse_transform(df['mix1_instrument'])
Out[49]: array(['Saxophone', 'Trumpet', 'Piano'], dtype=object)