Sklearn Column Transformer ValueError:无法将字符串转换为float:

时间:2020-05-10 21:04:04

标签: python scikit-learn

我正在尝试使用数据集中scikit-learn页面中的Column Transformer with Mixed Types示例创建管道,但出现错误:ValueError: could not convert string to float: 'Male'

from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_selector as selector
from sklearn.linear_model import LogisticRegression

numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
    ])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot',  OneHotEncoder())
    ])


preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, selector(dtype_exclude=["category",'object'])),
    ('cat', categorical_transformer, selector(dtype_include=["category",'object']))
])

X = train.drop(['Loan_Status', 'Loan_ID'], axis=1)
y = train['Loan_Status']


x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=101)

pipeline = Pipeline(steps = [('preprocessor', preprocessor),
                    ('classifier',LogisticRegression())
                  ])

pipeline.fit(x_train, y_train)
score = clf.score(x_test, y_test)

我阅读了其他具有相同错误的相关帖子,但其他所有发生在适合我的情况的情况都发生在分数评估中。

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-19-df09dd400283> in <module>()
      4 
      5 pipeline.fit(x_train, y_train)
----> 6 score = clf.score(x_test, y_test)

4 frames

/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

ValueError: could not convert string to float: 'Male'

这是数据集dtypes的概述:

Loan_ID               object
Gender                object
Married               object
Dependents            object
Education             object
Self_Employed         object
ApplicantIncome        int64
CoapplicantIncome    float64
LoanAmount           float64
Loan_Amount_Term     float64
Credit_History       float64
Property_Area         object
Loan_Status           object
dtype: object

数据集的前几行: enter image description here

1 个答案:

答案 0 :(得分:0)

更改

preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, selector(dtype_exclude=["category",'object'])),
    ('cat', categorical_transformer, selector(dtype_include=["category",'object']))
])

preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, selector(dtype_exclude=object)),
    ('cat', categorical_transformer, selector(dtype_include=object))
])

参考:https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html