Question

我正在尝试使用Scikit Learn库的RandomForestClassifier。

我将数据保存在使用LabelEncoder进行预处理的数据框中，如下所示：

from sklearn import preprocessing
from sklearn.preprocessing import LabelEncoder

for column in df.columns:
    if df[column].dtype == type(object):
        le = LabelEncoder()
        df[column] = le.fit_transform(df[column])

然后我像这样创建我的训练和测试集：

# Labels are the values we want to predict
labels = np.array(df['hta_tota'])
# Remove the labels from the features
# axis 1 refers to the columns
df= df.drop('hta_tota', axis = 1)
# Saving feature names for later use
feature_list = list(df.columns)
# Convert to numpy array
dfNpy = np.array(df)
train_features, test_features, train_labels, test_labels = train_test_split(dfNpy, labels, test_size = 0.25, random_state = 42)

现在我正在尝试使用RandomForestClassifier来适应我的训练集...

rf = RandomForestClassifier(n_jobs=2, random_state=0)
rf.fit(train_features, train_labels);

...但是出现以下错误：

ValueError：无法将字符串转换为float：masculino

masculino是我在数据框中的一列下的字符串值之一。但是我使用LabelEncoder对该列进行编码！

这是怎么回事？有什么想法吗？

先谢谢了。

更新：有关数据框df的更多信息；它是按如下方式创建和简化的：

df = pd.read_stata('health_data/Hipertension_entrega.dta')
cols_wanted = ['folio', 'desc_ent', 'desc_mun', 'sexo', 'edad', 'hta_tota']
df = df[cols_wanted]
df = df[pd.notnull(df['hta_tota'])]
df.set_index('folio')

然后，一旦我通过LabelEncoder进行了预处理（如上所示），df仍然返回以下内容：

使用RandomForestClassifier时出现“ ValueError：无法将字符串转换为浮点型”

0 个答案: