Python / Sklearn - 值错误:无法将字符串转换为浮点数

时间:2017-11-15 16:34:46

标签: python pandas dataframe scikit-learn neural-network

我尝试使用10倍CV在我的数据集中运行kNN分类器。我对WEKA的模型有一些经验,但很难将其转移到Sklearn。

以下是我的代码

sub next_level {
    my ($user) = @_;
    $user =~ s/(\d+)\z/ $1 + 1 /e;
    return $user;
}

我收到此错误

filename = 'train4.csv'
names = ['attribute names are here']

df = pandas.read_csv(filename, names=names)

num_folds = 10
kfold = KFold(n_splits=10, random_state=7)
model = KNeighborsClassifier()
results = cross_val_score(model, df.drop('mix1_instrument', axis=1), df['mix1_instrument'], cv=kfold)
print(results.mean())

如何转换此属性?这包含用于对我的实例进行分类的有用信息,转换会对此产生影响吗?

有两个属性是' object'我相信需要转换命名的' class1'和class2'

以下示例数据......

 ValueError: could not convert string to float: ''

谢谢

1 个答案:

答案 0 :(得分:4)

这是一个小型演示:

来源DF:

In [43]: df
Out[43]:
     Energy  HamoPkStd       class1             class2 mix1_instrument
0 -2.961480  14.391206    aerophone   aero_single-reed       Saxophone
1 -3.522993  20.306125  chordophone  aero_lip-vibrated         Trumpet
2 -3.409359   9.727358    aerophone        chrd_simple           Piano

标签编码:

In [44]: %paste
from sklearn.preprocessing import LabelBinarizer, LabelEncoder

str_cols = df.columns[df.columns.str.contains('(?:class|instrument)')]
clfs = {c:LabelEncoder() for c in str_cols}

for col, clf in clfs.items():
    df[col] = clfs[col].fit_transform(df[col])
## -- End pasted text --

结果 - 所有文本/字符串列都已转换为数字,因此我们可以将其提供给神经网络:

In [45]: df
Out[45]:
     Energy  HamoPkStd  class1  class2  mix1_instrument
0 -2.961480  14.391206       0       1                1
1 -3.522993  20.306125       1       0                2
2 -3.409359   9.727358       0       2                0

反向转换:

In [48]: clfs['class1'].inverse_transform(df['class1'])
Out[48]: array(['aerophone', 'chordophone', 'aerophone'], dtype=object)

In [49]: clfs['mix1_instrument'].inverse_transform(df['mix1_instrument'])
Out[49]: array(['Saxophone', 'Trumpet', 'Piano'], dtype=object)