我正在研究多目标(二进制)分类。有11个目标,我正在使用sklearn的MultiOutputClassifier。我在使用Predict_proba函数时遇到困难。查看数据集的摘要,以及下面的代码:
import pandas as pd
import numpy as npy
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.multioutput import MultiOutputClassifier
data = pd.read_csv("123.csv")
target = ['H67BC97','H67GC93','H67LC63','H67WC103','H67RC91','H67YC73','H67RC92','H67GC94','H67LC64','H67NC60','H67YC72']
train, test = train_test_split(data, test_size=0.2)
X_train = train.drop(['H67BC97','H67GC93','H67LC63','H67WC103','H67RC91','H67YC73','H67RC92','H67GC94','H67LC64','H67NC60','H67YC72','FORMULA_NUMBER'],axis=1)
X_test = test.drop(['H67BC97','H67GC93','H67LC63','H67WC103','H67RC91','H67YC73','H67RC92','H67GC94','H67LC64','H67NC60','H67YC72','FORMULA_NUMBER'],axis=1)
Y_train = train[target]
Y_test = test[target]
model = MultiOutputClassifier(GradientBoostingClassifier())
model.fit(X_train, Y_train)
target_probabilities = model.predict_proba(X_test)
print(target_probabilities)
概率输出似乎格式不正确。我得到11 565x2数组(565是测试集的长度)。我想将target_probabilities保存到一个csv文件中,但出现错误:ValueError:预期为1D或2D数组,取而代之的是3D数组。我的问题与链接上的问题基本相同- https://datascience.stackexchange.com/questions/22762/understanding-predict-proba-from-multioutputclassifier,但那里的答案仅说明了为什么输出是一组数组。
编辑:我已经简化了问题。
target_probabilities = array(target_probabilities)
现在target_probabilities是一个(11,565,2)矩阵-需要将矩阵的形式更改为(565,11),其中每一行的形式为target_probabilities [:,i] [:, 1],用于我在(0,565)范围内。