我对Python很陌生。我遇到以下一个我确实需要帮助的问题:
df = pd.read_csv('train.csv') #titanic dataset from Kaggle
df = df.loc[df.Embarked.notna(), ['Survived', 'Pclass', 'Sex', 'SibSp', 'Embarked']]
X = df.drop('Survived', axis='columns')
y = df.Survived
column_trans = make_column_transformer(
(OneHotEncoder(), ['Sex', 'Embarked']),
remainder='passthrough')
column_trans.fit_transform(X)
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
param_grid = dict(n_neighbors=k_range)
knn = KNeighborsClassifier()
pipe = make_pipeline(column_trans, knn)
grid = GridSearchCV(pipe, param_grid, cv=10, scoring='accuracy')
grid.fit(train_X, train_y) #this line gives me an error
最后一行给我一个错误:
ValueError: Invalid parameter n_neighbors for estimator Pipeline(memory=None,
steps=[('columntransformer',
ColumnTransformer(n_jobs=None, remainder='passthrough',
sparse_threshold=0.3,
transformer_weights=None,
transformers=[('onehotencoder',
OneHotEncoder(categories='auto',
drop=None,
dtype=<class 'numpy.float64'>,
handle_unknown='error',
sparse=True),
['Sex', 'Embarked'])],
verbose=False)),
('kneighborsclassifier',
KNeighborsClassifier(algorithm='auto', leaf_size=30,
metric='minkowski', metric_params=None,
n_jobs=None, n_neighbors=5, p=2,
weights='uniform'))],
verbose=False). Check the list of available parameters with `estimator.get_params().keys()`.
我在这里做错了什么?难道不可能同时进行oneHot编码,knn和管道吗?
答案 0 :(得分:0)
可以使用__
分隔的参数名称来设置管道的参数,此外,您还需要使用定义管道的方式进行修订。请参考下面的修改代码:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
df = pd.read_csv("titanic.csv")
df = df.drop(["Name"], axis=1)
X = df.drop('Survived', axis='columns')
y = df.Survived
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
column_trans = make_column_transformer(
(OneHotEncoder(), ['Sex']),
remainder='passthrough')
knn = KNeighborsClassifier()
pipe = Pipeline(steps=[('column_trans', column_trans), ('knn', knn)])
param_grid = {
'knn__n_neighbors': [2,5,15, 30, 45, 64]
}
grid = GridSearchCV(pipe, param_grid, cv=10, scoring='accuracy')
grid.fit(train_X,train_y)
grid.best_params_
#{'knn__n_neighbors': 5}
希望这会有所帮助!