我有一个lightgbm多类分类模型,我想为其创建混淆矩阵。第一步,我只想在df上绘制预测值与实际值的关系...我的问题是lightgbm.predict是否按给定的数据集的顺序返回预测值。
如果您遵循以下代码,我的“预测”部分是否正确地将测试数据集行与对应的预测行进行了匹配?
这是我如何创建测试和培训集的方法:
# split train and test into X and Y
X_train = train_data[:,0:(model.shape[1]-2)] ; Y_train = train_data[:,model.shape[1]-1] # python starts counting at 0
X_test = test_data[:,0:(model.shape[1]-2)] ; Y_test = test_data[:,model.shape[1]-1] # python starts counting at 0
#training and eval dataset
lgb_train = lgb.Dataset(data = X_train, label = Y_train)
lgb_test = lgb.Dataset(data = X_test, label = Y_test)
运行模型:
#run model
bst_model = lgb.train(params = parameters, train_set = lgb_train, num_boost_round = 1000,
valid_sets = [lgb_train,lgb_test], early_stopping_rounds = 7)
#categorical_feature = categoricals_vec)
然后是预测:
#Predictions
preds = bst_model.predict(X_test)
preds_df = pd.DataFrame(preds, columns = ['0','1','2'])
preds_df['pred'] = preds_df.idxmax(axis=1)
preds_df['actual'] = boost_data_set.iloc[0:breakpoint,boost_data_set.shape[1]-1]
答案 0 :(得分:0)
是的。预测是按顺序进行的。