我正在尝试构建模型并创建网格搜索,下面是代码。 从此站点下载原始数据(信用卡欺诈数据)。 https://www.kaggle.com/mlg-ulb/creditcardfraud
读取数据后从标准化开始的代码。
standardization = StandardScaler()
credit_card_fraud_df[['Amount']] = standardization.fit_transform(credit_card_fraud_df[['Amount']])
# Assigning feature variable to X
X = credit_card_fraud_df.drop(['Class'], axis=1)
# Assigning response variable to y
y = credit_card_fraud_df['Class']
# Splitting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, test_size=0.3, random_state=100)
X_train.head()
power_transformer = PowerTransformer(copy=False)
power_transformer.fit(X_train) ## Fit the PT on training data
X_train_pt_df = power_transformer.transform(X_train) ## Then apply on all data
X_test_pt_df = power_transformer.transform(X_test)
y_train_pt_df = y_train
y_test_pt_df = y_test
train_pt_df = pd.DataFrame(data=X_train_pt_df, columns=X_train.columns.tolist())
# set up cross validation scheme
folds = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 4)
# specify range of hyperparameters
params = {"C":np.logspace(-3,3,5,7), "penalty":["l1","l2"]}# l1 lasso l2 ridge
## using Logistic regression for class imbalance
model = LogisticRegression(class_weight='balanced')
grid_search_cv = GridSearchCV(estimator = model, param_grid = params,
scoring= 'roc_auc',
cv = folds,
return_train_score=True, verbose = 1)
grid_search_cv.fit(X_train_pt_df, y_train_pt_df)
## reviewing the results
cv_results = pd.DataFrame(grid_search_cv.cv_results_)
cv_results
抽样结果:
mean_fit_time std_fit_time mean_score_time std_score_time param_C param_penalty params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 0.044332 0.002040 0.000000 0.000000 0.001 l1 {'C': 0.001, 'penalty': 'l1'} NaN NaN NaN NaN NaN NaN NaN 6
1 0.477965 0.046651 0.016745 0.003813 0.001 l2 {'C': 0.001, 'penalty': 'l2'} 0.485714 0.428571 0.542857 0.485714 0.457143 0.480000 0.037904 5
我在输入数据中没有空值。我不明白为什么我要为这些列获取Nan值。谁能帮我吗?
答案 0 :(得分:0)
您对此处定义的默认求解器有疑问:
model = LogisticRegression(class_weight='balanced')
,它来自以下错误消息:
ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.
此外,在定义参数网格之前研究docs可能会很有用:
惩罚:{'l1','l2','elasticnet','none'},默认值='l2' 用于指定惩罚中使用的规范。 “ newton-cg”,“ sag”和“ lbfgs”求解器仅支持l2惩罚。 “ saga”求解器仅支持“ elasticnet”。如果为“无”(liblinear求解器不支持),则不应用任何正则化。
只要使用支持所需网格的其他求解器对其进行校正,就可以了:
## using Logistic regression for class imbalance
model = LogisticRegression(class_weight='balanced', solver='saga')
grid_search_cv = GridSearchCV(estimator = model, param_grid = params,
scoring= 'roc_auc',
cv = folds,
return_train_score=True, verbose = 1)
grid_search_cv.fit(X_train_pt_df, y_train_pt_df)
## reviewing the results
cv_results = pd.DataFrame(grid_search_cv.cv_results_)
请注意ConvergenceWarning
,这可能表明您需要增加默认值max_iter
,tol
,或切换到另一个求解器,然后重新考虑所需的参数网格。