我正在尝试使用数据集中scikit-learn页面中的Column Transformer with Mixed Types示例创建管道,但出现错误:ValueError: could not convert string to float: 'Male'
。
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_selector as selector
from sklearn.linear_model import LogisticRegression
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('onehot', OneHotEncoder())
])
preprocessor = ColumnTransformer(transformers=[
('num', numeric_transformer, selector(dtype_exclude=["category",'object'])),
('cat', categorical_transformer, selector(dtype_include=["category",'object']))
])
X = train.drop(['Loan_Status', 'Loan_ID'], axis=1)
y = train['Loan_Status']
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=101)
pipeline = Pipeline(steps = [('preprocessor', preprocessor),
('classifier',LogisticRegression())
])
pipeline.fit(x_train, y_train)
score = clf.score(x_test, y_test)
我阅读了其他具有相同错误的相关帖子,但其他所有发生在适合我的情况的情况都发生在分数评估中。
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-df09dd400283> in <module>()
4
5 pipeline.fit(x_train, y_train)
----> 6 score = clf.score(x_test, y_test)
4 frames
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
ValueError: could not convert string to float: 'Male'
这是数据集dtypes的概述:
Loan_ID object
Gender object
Married object
Dependents object
Education object
Self_Employed object
ApplicantIncome int64
CoapplicantIncome float64
LoanAmount float64
Loan_Amount_Term float64
Credit_History float64
Property_Area object
Loan_Status object
dtype: object
答案 0 :(得分:0)
更改
preprocessor = ColumnTransformer(transformers=[
('num', numeric_transformer, selector(dtype_exclude=["category",'object'])),
('cat', categorical_transformer, selector(dtype_include=["category",'object']))
])
到
preprocessor = ColumnTransformer(transformers=[
('num', numeric_transformer, selector(dtype_exclude=object)),
('cat', categorical_transformer, selector(dtype_include=object))
])
参考:https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html