我实际上是在尝试使用SuperLearner预测客户流失,这就是我所拥有的:
Training_Balance [,-1]:代表我在SuperLearner算法中的X。
Training_Balance [,1]:代表我在SuperLearner算法中的Y。
str(training_balance[,-1]):
'data.frame': 6945 obs. of 37 variables:
$ Imp_Valore_del_Cliente : num 329.4 291.6 223 30.6 38.5 ...
$ Flag_Apertura_Conto_Online : int 0 0 0 1 0 0 0 0 0 0 ...
$ Flag_Possesso_piu_Conti : int 0 0 0 0 0 0 0 0 0 0 ...
$ Eta : num 44 71 46 42 33 39 55 51 51 33 ...
$ Anno_Apertura_primo_Conto : num 2004 2009 2002 2008 2006 ...
$ Tipologia_Cliente : Factor w/ 5 levels "ActiveTrader",..: 2 5 3 2 3 2 3 2 2 3 …
.
.
.
str(training_balance[,1]):
Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 …
当我实现SuperLearner时
model <- SuperLearner(Y = training_balance[,1],X = training_balance[,-1],family = binomial() , SL.library = list("SL.xgboost","SL.glm","SL.randomForest"))
我收到此错误:
Error in SuperLearner(Y = training_balance[, 1], X = training_balance[, :
the outcome Y must be a numeric vector
任何建议都会非常感谢,谢谢。