当我尝试使用Caret
运行XGBoost时出现此错误train.default中的错误(x = x,y = y,trControl = xgb_trcontrol,tuneGrid = xgb_grid,: 至少有一个类级别不是有效的R变量名称;这会在生成类概率时导致错误,因为变量名称将转换为X0,X1。请使用可用作有效R变量名称的因子级别(请参阅?make.names以获取帮助)。
这是数据的结构
str(datafinal)
'data.frame': 747467 obs. of 11 variables:
$ CustSegment : Factor w/ 5 levels "Corporate","Legal",..: 4 4 4 4 4 4 4 4 4 4 ...
$ SubSegment1 : Factor w/ 23 levels "Academic","Association.foundation.Trust",..: 21 21 22 22 22 22 22 22 22 22 ...
$ BrandCode : Factor w/ 2 levels "PPC","QF": 1 1 1 1 1 1 1 1 1 1 ...
$ Qty : int 1 1 1 0 -1 -1 1 -1 -1 1 ...
$ Price : num 0 285 395 415 180 ...
$ Discount : num 0 0 20 20 0 0 0 0 0 20 ...
$ ProductGroup: Factor w/ 228 levels "AAL","AAT","AAU",..: 20 20 11 11 11 11 11 11 11 20 ...
$ Year : int 2016 2016 2014 2014 2016 2016 2016 2017 2016 2014 ...
$ Month : int 8 8 4 11 4 4 6 1 12 5 ...
$ Top1Percent : Factor w/ 2 levels "No","Yes": 1 1 2 2 2 2 2 2 2 2 ...
$ estore : int 0 0 0 0 0 0 0 0 0 0 ...
这是代码
# import data
library(caret)
library(readr)
datafinal <- read_csv("~/Document_Repository/estore_Cleaned_Data.csv")
# convert characters to factors and ensure syntactically valid names out of character vectors
datafinal$CustSegment <- make.names(datafinal$CustSegment)
datafinal$SubSegment1 <- make.names(datafinal$SubSegment1)
datafinal$BrandCode <- make.names(datafinal$BrandCode)
datafinal$ProductGroup <- make.names(datafinal$ProductGroup)
datafinal$Top1Percent <- make.names(datafinal$Top1Percent)
datafinal$CustSegment <- as.factor(datafinal$CustSegment)
datafinal$SubSegment1 <- as.factor(datafinal$SubSegment1)
datafinal$BrandCode <- as.factor(datafinal$BrandCode)
datafinal$ProductGroup <- as.factor(datafinal$ProductGroup)
datafinal$Top1Percent <- as.factor(datafinal$Top1Percent)
colnames(datafinal) <- make.names(colnames(datafinal))
# run xgboost analysis
library(xgboost)
set.seed(123)
train= sample(c(TRUE,TRUE,TRUE,FALSE), nrow(datafinal),rep=TRUE)
test = (!train)
datafinal.train <- datafinal[train,]
datafinal.test <- datafinal[test,]
# set up cross validated hyper parameter search
xgb_grid <- expand.grid(
nrounds = 100,
eta = c(0.3,0.1,0.01),
max_depth = c(1,2,4,8),
gamma = c(0,1)
)
xgb_trcontrol <- trainControl(
method = "cv",
number = 5,
verboseIter = T,
returnData = F,
returnResamp = "all",
classProbs = T,
summaryFunction = twoClassSummary,
allowParallel = T)
# train the model
library(plyr)
library(dplyr)
x = as.matrix(datafinal.train %>%
select (-estore))
y = as.factor(datafinal.train$estore)
xgb_train <- train(
x = x,
y = y,
trControl = xgb_trcontrol,
tuneGrid = xgb_grid,
method = "xgbTree"
)
Error in train.default(x = x, y = y, trControl = xgb_trcontrol, tuneGrid = xgb_grid, :
At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to X0, X1 . Please use factor levels that can be used as valid R variable names (see ?make.names for help).
我知道ProductGroup有228个级别,因此我尝试使用以下代码对该列进行一次编码,并在结果列上使用make.names()。但没有运气和同样的错误。
simpleMod <- dummyVars(~ProductGroup, data = datafinal, levelsOnly = T)
ProductGroup <- as.data.frame(predict(simpleMod, datafinal))
datafinal <- cbind(datafinal,ProductGroup)
datafinal$ProductGroup <- NULL
我甚至尝试过对所有列进行热编码,但仍然没有香槟。
这是新的数据结构
str(datafinal, list.len = 270)
'data.frame': 747467 obs. of 264 variables:
$ BrandCode : num 1 1 1 1 1 1 1 1 1 1 ...
$ Qty : int 1 1 1 0 -1 -1 1 -1 -1 1 ...
$ Price : num 0 285 395 415 180 ...
$ Discount : num 0 0 20 20 0 0 0 0 0 20 ...
$ Year : int 2016 2016 2014 2014 2016 2016 2016 2017 2016 2014 ...
$ Month : int 8 8 4 11 4 4 6 1 12 5 ...
$ Top1Percent : num 0 0 1 1 1 1 1 1 1 1 ...
$ estore : int 0 0 0 0 0 0 0 0 0 0 ...
$ AAL : num 0 0 0 0 0 0 0 0 0 0 ...
$ AAT : num 0 0 0 0 0 0 0 0 0 0 ...
$ AAU : num 0 0 0 0 0 0 0 0 0 0 ...
$ AET : num 0 0 0 0 0 0 0 0 0 0 ...
$ AFI : num 0 0 0 0 0 0 0 0 0 0 ...
$ AFT : num 0 0 0 0 0 0 0 0 0 0 ...
$ AIT : num 0 0 0 0 0 0 0 0 0 0 ...
$ ALG : num 0 0 0 0 0 0 0 0 0 0 ...
$ ANN : num 0 0 0 0 0 0 0 0 0 0 ...
$ ARE : num 0 0 0 0 0 0 0 0 0 0 ...
$ ASB : num 0 0 1 1 1 1 1 1 1 0 ...
$ BCL : num 0 0 0 0 0 0 0 0 0 0 ...
$ BKR : num 0 0 0 0 0 0 0 0 0 0 ...
$ BSL : num 0 0 0 0 0 0 0 0 0 0 ...
$ BTB : num 0 0 0 0 0 0 0 0 0 0 ...
$ BTT : num 0 0 0 0 0 0 0 0 0 0 ...
$ BVC : num 0 0 0 0 0 0 0 0 0 0 ...
$ BVT : num 0 0 0 0 0 0 0 0 0 0 ...
$ CAB : num 0 0 0 0 0 0 0 0 0 0 ...
$ CAR : num 1 1 0 0 0 0 0 0 0 1 ...
$ CAS : num 0 0 0 0 0 0 0 0 0 0 ...
$ CAT : num 0 0 0 0 0 0 0 0 0 0 ...
$ CBL : num 0 0 0 0 0 0 0 0 0 0 ...
$ CCT : num 0 0 0 0 0 0 0 0 0 0 ...
$ CD_ : num 0 0 0 0 0 0 0 0 0 0 ...
$ CON : num 0 0 0 0 0 0 0 0 0 0 ...
$ CRB : num 0 0 0 0 0 0 0 0 0 0 ...
$ CRT : num 0 0 0 0 0 0 0 0 0 0 ...
$ DBP : num 0 0 0 0 0 0 0 0 0 0 ...
$ DBT : num 0 0 0 0 0 0 0 0 0 0 ...
$ DEP : num 0 0 0 0 0 0 0 0 0 0 ...
$ DIV : num 0 0 0 0 0 0 0 0 0 0 ...
$ DLR : num 0 0 0 0 0 0 0 0 0 0 ...
$ DON : num 0 0 0 0 0 0 0 0 0 0 ...
$ EBP : num 0 0 0 0 0 0 0 0 0 0 ...
$ EGA : num 0 0 0 0 0 0 0 0 0 0 ...
$ EGT : num 0 0 0 0 0 0 0 0 0 0 ...
$ ELG : num 0 0 0 0 0 0 0 0 0 0 ...
$ ELT : num 0 0 0 0 0 0 0 0 0 0 ...
$ FAP : num 0 0 0 0 0 0 0 0 0 0 ...
$ FAS : num 0 0 0 0 0 0 0 0 0 0 ...
$ FSP : num 0 0 0 0 0 0 0 0 0 0 ...
$ FTE : num 0 0 0 0 0 0 0 0 0 0 ...
$ GAP : num 0 0 0 0 0 0 0 0 0 0 ...
$ GAR : num 0 0 0 0 0 0 0 0 0 0 ...
$ GAS : num 0 0 0 0 0 0 0 0 0 0 ...
$ GCB : num 0 0 0 0 0 0 0 0 0 0 ...
$ GCR : num 0 0 0 0 0 0 0 0 0 0 ...
$ GEP : num 0 0 0 0 0 0 0 0 0 0 ...
$ GFS : num 0 0 0 0 0 0 0 0 0 0 ...
$ GFT : num 0 0 0 0 0 0 0 0 0 0 ...
$ GOT : num 0 0 0 0 0 0 0 0 0 0 ...
$ GOV : num 0 0 0 0 0 0 0 0 0 0 ...
$ GPM : num 0 0 0 0 0 0 0 0 0 0 ...
$ GQC : num 0 0 0 0 0 0 0 0 0 0 ...
$ GRA : num 0 0 0 0 0 0 0 0 0 0 ...
$ GRO : num 0 0 0 0 0 0 0 0 0 0 ...
$ GSA : num 0 0 0 0 0 0 0 0 0 0 ...
$ GUN : num 0 0 0 0 0 0 0 0 0 0 ...
$ HCR : num 0 0 0 0 0 0 0 0 0 0 ...
$ HLC : num 0 0 0 0 0 0 0 0 0 0 ...
$ HOA : num 0 0 0 0 0 0 0 0 0 0 ...
$ HTL : num 0 0 0 0 0 0 0 0 0 0 ...
$ HUD : num 0 0 0 0 0 0 0 0 0 0 ...
$ ICC : num 0 0 0 0 0 0 0 0 0 0 ...
$ ICF : num 0 0 0 0 0 0 0 0 0 0 ...
$ IDT : num 0 0 0 0 0 0 0 0 0 0 ...
$ IFA : num 0 0 0 0 0 0 0 0 0 0 ...
$ IPL : num 0 0 0 0 0 0 0 0 0 0 ...
$ IPT : num 0 0 0 0 0 0 0 0 0 0 ...
$ IRS : num 0 0 0 0 0 0 0 0 0 0 ...
$ ITG : num 0 0 0 0 0 0 0 0 0 0 ...
$ LLC : num 0 0 0 0 0 0 0 0 0 0 ...
$ LSA : num 0 0 0 0 0 0 0 0 0 0 ...
$ LSS : num 0 0 0 0 0 0 0 0 0 0 ...
$ ML2 : num 0 0 0 0 0 0 0 0 0 0 ...
$ MLC : num 0 0 0 0 0 0 0 0 0 0 ...
$ NBK : num 0 0 0 0 0 0 0 0 0 0 ...
$ NFS : num 0 0 0 0 0 0 0 0 0 0 ...
$ NOT : num 0 0 0 0 0 0 0 0 0 0 ...
$ NPC : num 0 0 0 0 0 0 0 0 0 0 ...
$ NPE : num 0 0 0 0 0 0 0 0 0 0 ...
$ NPG : num 0 0 0 0 0 0 0 0 0 0 ...
$ NPH : num 0 0 0 0 0 0 0 0 0 0 ...
$ NPO : num 0 0 0 0 0 0 0 0 0 0 ...
$ NPT : num 0 0 0 0 0 0 0 0 0 0 ...
$ NTE : num 0 0 0 0 0 0 0 0 0 0 ...
$ NUN : num 0 0 0 0 0 0 0 0 0 0 ...
$ NWT : num 0 0 0 0 0 0 0 0 0 0 ...
$ OFS : num 0 0 0 0 0 0 0 0 0 0 ...
$ PBS : num 0 0 0 0 0 0 0 0 0 0 ...
$ PCA : num 0 0 0 0 0 0 0 0 0 0 ...
$ PCR : num 0 0 0 0 0 0 0 0 0 0 ...
$ PFO : num 0 0 0 0 0 0 0 0 0 0 ...
$ PFP : num 0 0 0 0 0 0 0 0 0 0 ...
$ PFS : num 0 0 0 0 0 0 0 0 0 0 ...
$ PHY : num 0 0 0 0 0 0 0 0 0 0 ...
$ PLG : num 0 0 0 0 0 0 0 0 0 0 ...
$ PLS : num 0 0 0 0 0 0 0 0 0 0 ...
$ PNP : num 0 0 0 0 0 0 0 0 0 0 ...
$ POC : num 0 0 0 0 0 0 0 0 0 0 ...
$ PRC : num 0 0 0 0 0 0 0 0 0 0 ...
$ PRF : num 0 0 0 0 0 0 0 0 0 0 ...
$ PRL : num 0 0 0 0 0 0 0 0 0 0 ...
$ PSS : num 0 0 0 0 0 0 0 0 0 0 ...
$ PTU : num 0 0 0 0 0 0 0 0 0 0 ...
$ PUB : num 0 0 0 0 0 0 0 0 0 0 ...
$ PUT : num 0 0 0 0 0 0 0 0 0 0 ...
$ Q13 : num 0 0 0 0 0 0 0 0 0 0 ...
$ Q14 : num 0 0 0 0 0 0 0 0 0 0 ...
$ Q15 : num 0 0 0 0 0 0 0 0 0 0 ...
$ Q16 : num 0 0 0 0 0 0 0 0 0 0 ...
$ Q17 : num 0 0 0 0 0 0 0 0 0 0 ...
$ Q40 : num 0 0 0 0 0 0 0 0 0 0 ...
$ Q4R : num 0 0 0 0 0 0 0 0 0 0 ...
$ QAS : num 0 0 0 0 0 0 0 0 0 0 ...
$ QBP : num 0 0 0 0 0 0 0 0 0 0 ...
$ QCA : num 0 0 0 0 0 0 0 0 0 0 ...
$ QDE : num 0 0 0 0 0 0 0 0 0 0 ...
$ QFT : num 0 0 0 0 0 0 0 0 0 0 ...
$ QHC : num 0 0 0 0 0 0 0 0 0 0 ...
$ QIP : num 0 0 0 0 0 0 0 0 0 0 ...
$ QIR : num 0 0 0 0 0 0 0 0 0 0 ...
$ QLB : num 0 0 0 0 0 0 0 0 0 0 ...
$ QLI : num 0 0 0 0 0 0 0 0 0 0 ...
$ QPE : num 0 0 0 0 0 0 0 0 0 0 ...
$ QSB : num 0 0 0 0 0 0 0 0 0 0 ...
$ QSM : num 0 0 0 0 0 0 0 0 0 0 ...
$ QSS : num 0 0 0 0 0 0 0 0 0 0 ...
$ QST : num 0 0 0 0 0 0 0 0 0 0 ...
$ RET : num 0 0 0 0 0 0 0 0 0 0 ...
$ RFS : num 0 0 0 0 0 0 0 0 0 0 ...
$ RPV : num 0 0 0 0 0 0 0 0 0 0 ...
$ RST : num 0 0 0 0 0 0 0 0 0 0 ...
$ SBC : num 0 0 0 0 0 0 0 0 0 0 ...
$ SPD : num 0 0 0 0 0 0 0 0 0 0 ...
$ SPR : num 0 0 0 0 0 0 0 0 0 0 ...
$ SPT : num 0 0 0 0 0 0 0 0 0 0 ...
$ T20 : num 0 0 0 0 0 0 0 0 0 0 ...
$ T2S : num 0 0 0 0 0 0 0 0 0 0 ...
$ T41 : num 0 0 0 0 0 0 0 0 0 0 ...
$ T55 : num 0 0 0 0 0 0 0 0 0 0 ...
$ T65 : num 0 0 0 0 0 0 0 0 0 0 ...
$ TAB : num 0 0 0 0 0 0 0 0 0 0 ...
$ TBX : num 0 0 0 0 0 0 0 0 0 0 ...
$ TCB : num 0 0 0 0 0 0 0 0 0 0 ...
$ TCH : num 0 0 0 0 0 0 0 0 0 0 ...
$ TDB : num 0 0 0 0 0 0 0 0 0 0 ...
$ TFT : num 0 0 0 0 0 0 0 0 0 0 ...
$ TID : num 0 0 0 0 0 0 0 0 0 0 ...
$ TIN : num 0 0 0 0 0 0 0 0 0 0 ...
$ TPS : num 0 0 0 0 0 0 0 0 0 0 ...
$ TRE : num 0 0 0 0 0 0 0 0 0 0 ...
$ TSC : num 0 0 0 0 0 0 0 0 0 0 ...
$ TXC : num 0 0 0 0 0 0 0 0 0 0 ...
$ TXL : num 0 0 0 0 0 0 0 0 0 0 ...
$ VAL : num 0 0 0 0 0 0 0 0 0 0 ...
$ W10 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W11 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W12 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W13 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W14 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W15 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W16 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W17 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W18 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W19 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W20 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W21 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W22 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W23 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W24 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W25 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W26 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W27 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W28 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W29 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W30 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W31 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W32 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W33 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W34 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W35 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W36 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W39 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W41 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W49 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W4M : num 0 0 0 0 0 0 0 0 0 0 ...
$ W50 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W52 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W70 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W72 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W73 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W74 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W75 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W77 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W78 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W81 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W84 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W86 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W87 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W88 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W91 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W92 : num 0 0 0 0 0 0 0 0 0 0 ...
$ W97 : num 0 0 0 0 0 0 0 0 0 0 ...
$ WCA : num 0 0 0 0 0 0 0 0 0 0 ...
$ WHW : num 0 0 0 0 0 0 0 0 0 0 ...
$ WML : num 0 0 0 0 0 0 0 0 0 0 ...
$ WNA : num 0 0 0 0 0 0 0 0 0 0 ...
$ WNR : num 0 0 0 0 0 0 0 0 0 0 ...
$ WSP : num 0 0 0 0 0 0 0 0 0 0 ...
$ WUS : num 0 0 0 0 0 0 0 0 0 0 ...
$ X706 : num 0 0 0 0 0 0 0 0 0 0 ...
$ X990 : num 0 0 0 0 0 0 0 0 0 0 ...
$ XBS : num 0 0 0 0 0 0 0 0 0 0 ...
$ XCA : num 0 0 0 0 0 0 0 0 0 0 ...
$ XCE : num 0 0 0 0 0 0 0 0 0 0 ...
$ XCG : num 0 0 0 0 0 0 0 0 0 0 ...
$ XCP : num 0 0 0 0 0 0 0 0 0 0 ...
$ XCR : num 0 0 0 0 0 0 0 0 0 0 ...
$ XDT : num 0 0 0 0 0 0 0 0 0 0 ...
$ XFL : num 0 0 0 0 0 0 0 0 0 0 ...
$ XIR : num 0 0 0 0 0 0 0 0 0 0 ...
$ XLI : num 0 0 0 0 0 0 0 0 0 0 ...
$ XSE : num 0 0 0 0 0 0 0 0 0 0 ...
$ XTR : num 0 0 0 0 0 0 0 0 0 0 ...
$ XTT : num 0 0 0 0 0 0 0 0 0 0 ...
$ XTX : num 0 0 0 0 0 0 0 0 0 0 ...
$ Academic : num 0 0 0 0 0 0 0 0 0 0 ...
$ Association.foundation.Trust: num 0 0 0 0 0 0 0 0 0 0 ...
$ Church : num 0 0 0 0 0 0 0 0 0 0 ...
$ Federal..Government : num 0 0 0 0 0 0 0 0 0 0 ...
$ Local.Government : num 0 0 0 0 0 0 0 0 0 0 ...
$ Miscellaneous : num 0 0 0 0 0 0 0 0 0 0 ...
$ Non.Profit.Organization : num 0 0 0 0 0 0 0 0 0 0 ...
$ Other : num 0 0 0 0 0 0 0 0 0 0 ...
$ Reserved : num 0 0 0 0 0 0 0 0 0 0 ...
$ Revenue..1B : num 0 0 0 0 0 0 0 0 0 0 ...
$ Revenue..500M..749 : num 0 0 0 0 0 0 0 0 0 0 ...
$ Revenue..750M.999M : num 0 0 0 0 0 0 0 0 0 0 ...
$ Revenue.less.than..500M : num 0 0 0 0 0 0 0 0 0 0 ...
$ State.Government : num 0 0 0 0 0 0 0 0 0 0 ...
$ Unknown : num 0 0 0 0 0 0 0 0 0 0 ...
$ X0.20.Attorneys : num 0 0 0 0 0 0 0 0 0 0 ...
$ X0.3.employee : num 0 0 0 0 0 0 0 0 0 0 ...
$ X16.30.employees : num 0 0 0 0 0 0 0 0 0 0 ...
$ X21.79.Attorneys : num 0 0 0 0 0 0 0 0 0 0 ...
$ X31..employees : num 0 0 0 0 0 0 0 0 0 0 ...
$ X4.7.employees : num 1 1 0 0 0 0 0 0 0 0 ...
$ X8.15.employees : num 0 0 1 1 1 1 1 1 1 1 ...
$ X80..Attorneys : num 0 0 0 0 0 0 0 0 0 0 ...
$ Corporate : num 0 0 0 0 0 0 0 0 0 0 ...
$ Legal : num 0 0 0 0 0 0 0 0 0 0 ...
$ Other CustSegment : num 0 0 0 0 0 0 0 0 0 0 ...
$ Professional : num 1 1 1 1 1 1 1 1 1 1 ...
$ PublicSector : num 0 0 0 0 0 0 0 0 0 0 ...