Question

我想使用逻辑回归来查看银行账户余额，人的年龄和买房能力之间的相关性。在我的回归模型实现后，我得到了类型的混淆矩阵：

array([[1006,    0],
   [ 125,    0]])

当我尝试对其他数据实施线性回归时就是这种情况。这是代码：

# importing dataset
dataset = pd.read_csv('/home/stayal0ne/Machine-learning/datasets/bank.csv', sep=';')
dataset['age'] = dataset['age'].astype(float)
dataset['balance'] = dataset['balance'].astype(float)
X = dataset.iloc[:, [0, 5]].values
y = dataset.iloc[:, -1].values

# splitting the dataset into the training and test sets
X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.25, random_state=42)

# encoding categorial data
label_encoder_y = LabelEncoder()
y = label_encoder_y.fit_transform(y)

# feature scaling
scale = StandardScaler()
X_train = scale.fit_transform(X_train)
X_test = scale.transform(X_test)

# Fitting classifier into the training set
classifier = LogisticRegression(random_state=42)
classifier.fit(X_train, y_train)

# Prediction
y_predicted = classifier.predict(X_test)

# Checking the accuracy
con_matrix = confusion_matrix(y_test, y_predicted)

任何帮助将不胜感激。

Answer 1

来自con_matrix的数组如下，tn，fp，fn，tp。

你的真实负面是1006，这意味着模特认为不能买房子的人，而且你的假阳性是0，这意味着你的模型并没有预测到有人能够在现实中无法买房。

你的假阴性是125，这意味着这些人实际上他们买得起房子，但你的模型却说他们可以。你的真实积极因素也是0，这意味着你的模型并没有正确地预测那些能买房的人是否真的可以。

我的总体猜测是，你可能会有很多人无法购买房子，而那些能够和可能的特征（银行的余额，年龄）与两者相似。

我建议你添加class_weight参数以防数据集不平衡，如果类标签为0则无法购买房屋，然后设置{0：0.1}以防你有90条记录不能买房子和10个能买房的记录

Answer 2

混淆矩阵的documentation是：

根据定义，混淆矩阵中的第i，j个条目实际上是第i组中的观测值数量，但预计在第j组中。

因此，在您的示例中，您有1006个类别0的样本被预测为类别0，而125个类别1的样本被预测为类别0。

这意味着您的模型可以预测0类中测试集的每个样本。

Answer 3

添加此行

y_predicted = np.round(y_predicted)

在此之前

con_matrix = confusion_matrix(y_test, y_predicted)

＆再次运行

Logistic回归，第二列混淆矩阵显示零

3 个答案: