用于多目标分类的Predict_proba函数

时间:2018-10-22 13:52:36

标签: python scikit-learn multilabel-classification

我正在研究多目标(二进制)分类。有11个目标,我正在使用sklearn的MultiOutputClassifier。我在使用Predict_proba函数时遇到困难。查看数据集的摘要,以及下面的代码:

import pandas as pd
import numpy as npy
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.multioutput import MultiOutputClassifier

data = pd.read_csv("123.csv") 

dataset

target = ['H67BC97','H67GC93','H67LC63','H67WC103','H67RC91','H67YC73','H67RC92','H67GC94','H67LC64','H67NC60','H67YC72']
train, test = train_test_split(data, test_size=0.2)
X_train = train.drop(['H67BC97','H67GC93','H67LC63','H67WC103','H67RC91','H67YC73','H67RC92','H67GC94','H67LC64','H67NC60','H67YC72','FORMULA_NUMBER'],axis=1)
X_test = test.drop(['H67BC97','H67GC93','H67LC63','H67WC103','H67RC91','H67YC73','H67RC92','H67GC94','H67LC64','H67NC60','H67YC72','FORMULA_NUMBER'],axis=1)
Y_train = train[target]
Y_test = test[target]

model = MultiOutputClassifier(GradientBoostingClassifier())
model.fit(X_train, Y_train)
target_probabilities = model.predict_proba(X_test)
print(target_probabilities) 

probabilities

概率输出似乎格式不正确。我得到11 565x2数组(565是测试集的长度)。我想将target_probabilities保存到一个csv文件中,但出现错误:ValueError:预期为1D或2D数组,取而代之的是3D数组。我的问题与链接上的问题基本相同-  https://datascience.stackexchange.com/questions/22762/understanding-predict-proba-from-multioutputclassifier,但那里的答案仅说明了为什么输出是一组数组。

编辑:我已经简化了问题。

target_probabilities = array(target_probabilities)

现在target_probabilities是一个(11,565,2)矩阵-需要将矩阵的形式更改为(565,11),其中每一行的形式为target_probabilities [:,i] [:, 1],用于我在(0,565)范围内。

0 个答案:

没有答案