xgboost poisson回归:标签必须是非负的

时间:2017-06-29 08:39:40

标签: r machine-learning regression xgboost poisson

我使用的是带有R和xgboost版本0.6-4的Windows 10笔记本电脑。运行以下代码时,我收到了一个奇怪的错误。

xgb_params <- list("objective" = "count:poisson",
                "eval_metric" = "rmse")
 regression <- xgboost(data = training_fold, 
                   label = y_training_fold, 
                   nrounds = 10,
                   params = xgb_params)

Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj) :
amalgamation/../src/objective/regression_obj.cc:190: Check failed: 
label_correct PoissonRegression: label must be nonnegative

但是当我看到标签的摘要时,它说:

Min.   1st Qu. Median  Mean   3rd Qu. Max.   NA's
0.1129 0.3387  0.7000  1.0987 1.5265  4.5405 287

我该如何解决这个问题?我试图删除NA,但这没有帮助。

提前致谢!

修改

以下是traindata的样本

dput(droplevels(head(train[, c(1,2,4,5,6,8,9,10,11)], 20)))

structure(list(VacancyId = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("55288","56838", "57822", "57902", "57925", "58008"), class = "factor"), 
VacancyBankId = c(2L, 1609L, 1611L, 147L, 17L, 1611L, 2L, 
257L, 1611L, 2L, 147L, 17L, 1611L, 239L, 1609L, 2L, 1609L, 
2L, 2L, 1609L), FunctionId = c(36L, 36L, 36L, 36L, 35L, 35L, 
3L, 4L, 4L, 4L, 4L, 9L, 9L, 9L, 3L, 3L, 3L, 3L, 3L, 3L), 
EducationLevel = c(6L, 6L, 6L, 6L, 6L, 6L, 4L, 6L, 6L, 6L, 
6L, 4L, 4L, 4L, 6L, 6L, 6L, 6L, 6L, 6L), ProvinceId = c(22L, 
22L, 22L, 22L, 24L, 24L, 19L, 16L, 16L, 16L, 16L, 19L, 19L, 
19L, 21L, 21L, 16L, 16L, 22L, 22L), CandidatesCount = c(126L, 
27L, 18L, 12L, 1L, 4L, 2L, 6L, 7L, 7L, 1L, 8L, 15L, 13L, 
7L, 7L, 7L, 7L, 7L, 7L), DurationDays = c(62L, 62L, 62L, 
62L, 18L, 18L, 43L, 61L, 61L, 61L, 61L, 60L, 60L, 60L, 62L, 
62L, 62L, 62L, 62L, 62L), DurationWeeks = c(8.857142857, 
8.857142857, 8.857142857, 8.857142857, 2.571428571, 2.571428571, 
6.142857143, 8.714285714, 8.714285714, 8.714285714, 8.714285714, 
8.571428571, 8.571428571, 8.571428571, 8.857142857, 8.857142857, 
8.857142857, 8.857142857, 8.857142857, 8.857142857), CandidatesPerWeek = c(NA, 
3.048387097, 2.032258065, 1.35483871, 0.388888889, 1.555555556, 
0.325581395, 0.68852459, 0.803278689, 0.803278689, 0.114754098, 
0.933333333, 1.75, 1.516666667, 0.790322581, 0.790322581, 
0.790322581, 0.790322581, 0.790322581, 0.790322581)), .Names = c("VacancyId", "VacancyBankId", "FunctionId", "EducationLevel", "ProvinceId", "CandidatesCount", "DurationDays", "DurationWeeks", "CandidatesPerWeek"), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 26L, 27L, 28L, 29L, 30L, 31L), class = "data.frame")

我想用FunctionId,Educationlevel,Province和VacancyBankId预测每周的候选人。所以y_training_fold是每周的候选人,training_fold是职能,教育,省和vacancybankid。

希望有人可以帮助我!

1 个答案:

答案 0 :(得分:1)

数据集中的问题不是Popen中存在负值,而是存在非整数值。
请使用subprocess.Popen(["batch_1.bat"], shell=True, cwd=r'd:\<your path>\dir1') 非整数值向量查看以下模拟:

y_training_fold

错误消息与您报告的完全相同:

y_training_fold

现在,尝试使用整数的library(xgboost) training_fold <- matrix(rnorm(1000),nrow=100) y_training_fold <- matrix(rnorm(100),ncol=1) xgb_params <- list("objective" = "count:poisson", "eval_metric" = "rmse") regression <- xgboost(data = training_fold, label = y_training_fold, nrounds = 10, params = xgb_params) 向量:

Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj) : 
  [11:46:28] amalgamation/../src/objective/regression_obj.cc:190: 
  Check failed: label_correct PoissonRegression: label must be nonnegative

现在y_training_fold效果很好:

y_training_fold <- matrix(rpois(100,10),ncol=1)

xgb_params <- list("objective" = "count:poisson",
                "eval_metric" = "rmse")
regression <- xgboost(data = training_fold, 
                   label = y_training_fold, 
                   nrounds = 10,
                   params = xgb_params)

修改

使用您的数据解决问题的方法是:

xgboost