我面临一个不平衡的数据集,所以我想调整 class_weight 参数以改进我的逻辑回归。 y 是二进制的。
我正在使用此代码
lg2 = LogisticRegression(class_weight={0:1,1:100})
# fit it
lg2.fit(X_train,y_train)
# test
y_pred = lg2.predict(X_test)
# performance
print(f'Accuracy Score: {accuracy_score(y_test,y_pred)}')
print(f'Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
print(f'Area Under Curve: {roc_auc_score(y_test, y_pred)}')
但我收到此错误
TypeError: '<' not supported between instances of 'str' and 'int'
虽然如果我使用“平衡”,一切正常
model = LogisticRegression(solver='lbfgs', class_weight='balanced')
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
# performance
print(f'Accuracy Score: {accuracy_score(y_test,y_pred)}')
print(f'Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
print(f'Area Under Curve: {roc_auc_score(y_test, y_pred)}')
Accuracy Score: 0.7380952380952381
Confusion Matrix:
[[81 30]
[ 3 12]]
Area Under Curve: 0.7648648648648648
我应该如何设置“class_weight”参数?
提前感谢您的回答!