Question

我正在尝试显示用于预测测试数据（二进制文本分类）的混淆矩阵。但是运行y_pred后，我无法获得y_test与model.predict()匹配。

首先，让我们看一下测试/真实数据：

y_test = (y_test > 0.5)
print(y_test)
print(type(y_test))

输出：

2       False
17       True
18       True
...
4980     True
4986    False
4990     True
pandas.core.series.Series

缺少的索引包含在训练集中。

当我们根据测试数据进行预测时，会发生以下情况：

y_pred = model.predict(data_test)
y_pred = (y_pred > 0.5)
print(y_pred)
print(type(y_pred))

输出：

[[ True]
 [ True]
 [ True]
 [False]
 ...
 [ True]
 [ True]
 [ True]]
numpy.ndarray

测试/真实数据：

y_test = (y_test > 0.5)
print(y_test)

输出：

2       False
17       True
18       True
...
4980     True
4986    False
4990     True

最终，我正在寻找一个混淆矩阵，但数据格式不同。

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

您有什么建议？

到目前为止的尝试：

y_test_np = y_test.values

输出：

[False  True  True ... True False  True]

Closer，但看起来我需要每个项目也必须是一个数组（例如[[ True] [False] [ True]]）。如何对齐数组？

Answer 1

仅出于说明目的，让我们创建一些示例数据。

y_test = pd.Series([True, False])
y_pred = np.array([[True], [False]])

您可以将熊猫系列y_test转换为numpy数组

y_test.values

和squeeze的numpy数组y_pred获得相同的形状

numpy.squeeze(y_pred)