将数据帧的行作为参数传递给函数,同时保持其他参数不变

时间:2017-02-08 12:05:33

标签: r

跟进Pass rows of a data frame as arguments to a function in R with column names specifying the arguments

我想用不同的参数组合训练以下模型:

library(xgboost)
library(Matrix)

df <- data.frame(y = sample(0:1, 1000, replace = TRUE),
                 a = rnorm(1000),
                 b = rnorm(1000),
                 c = rnorm(1000),
                 d = rnorm(1000))

train <- sparse.model.matrix(object = y~.-1, data = df)

model <- xgboost(data = train,
                 label = df$y,
                 # parameters
                 nrounds = 10, 
                 subsample = 0.5,
                 colsample_bytree = 0.8)

我使用参数创建了一个网格,我希望将网格的行传递到xgboost函数中,同时保持datalabel参数不变。

param <- expand.grid(nrounds = c(10, 50, 100),
                     subsample = c(0.5, 0.8, 0.9),
                     colsample_bytree = c(0.8))

我想使用列名来传递参数来指定它们(如果列名不是一个选项,列的顺序也会这样做),因为这会使调用对不同的函数可伸缩。

2 个答案:

答案 0 :(得分:2)

您可以使用mapply()

models_list <- mapply(function(x,y,z) xgboost(data = train,
                                              label = df$y,
                                              # parameters
                                              nrounds = x,
                                              subsample = y,
                                              colsample_bytree = z),
                      param$nrounds, param$subsample, param$colsample_bytree, SIMPLIFY = FALSE)

它会为您提供所有模型的列表:

>models_list[[1]]
##### xgb.Booster
raw: 25.2 Kb 
call:
  xgb.train(params = params, data = dtrain, nrounds = nrounds, 
    watchlist = watchlist, verbose = verbose, print_every_n = print_every_n, 
    early_stopping_rounds = early_stopping_rounds, maximize = maximize, 
    save_period = save_period, save_name = save_name, xgb_model = xgb_model, 
    callbacks = callbacks, subsample = ..1, colsample_bytree = ..2)
params (as set within xgb.train):
  subsample = "0.5", colsample_bytree = "0.8", silent = "1"
xgb.attributes:
  niter
callbacks:
  cb.print.evaluation(period = print_every_n)
  cb.evaluation.log()
  cb.save.model(save_period = save_period, save_name = save_name)
niter: 10
evaluation_log:
    iter train_rmse
       1   0.487354
       2   0.473657
---                
       9   0.419176
      10   0.412587

答案 1 :(得分:2)

我有一个类似的问题,徒劳地寻找,直到我在Hadley的Advanced R中找到它为止。这使您可以将列名作为参数传递给出现在数据框中的参数。在这里阅读:

https://adv-r.hadley.nz/functionals.html#pmap

就在这里。通过purrr::pmap有一个解决方案。它将参数映射到一个函数上:

from Hadley's Advanced R, 8.4.5

这是我自己的代码,最近我与quanteda一起使用来混淆Kaggle SMS垃圾邮件数据集。这些是我参数的可能性:

tolower <- data_frame(tolower = c(TRUE, FALSE))
stem <- data_frame(stem = c(TRUE, FALSE))
remove_punct <- data_frame(remove_punct = c(TRUE, FALSE))

这是一个奖励,不是必需的,但是我发现我需要所有参数组合才能运行Naive Bayes模型。感谢Y J通过SO post

expand.grid.df <- function(...) Reduce(function(...) merge(..., by=NULL), list(...))
parameters <- expand.grid.df(tolower, stem, remove_punct)

所以,现在我的参数如下:

> parameters
  tolower  stem remove_punct
1    TRUE  TRUE         TRUE
2   FALSE  TRUE         TRUE
3    TRUE FALSE         TRUE
4   FALSE FALSE         TRUE
5    TRUE  TRUE        FALSE
6   FALSE  TRUE        FALSE
7    TRUE FALSE        FALSE
8   FALSE FALSE        FALSE

现在魔术,通过dfm将参数传递到我选择的函数(pmap):

mymodels <- pmap(parameters, dfm, x = mycorpus)

({x = mycorpus是一个常量,我想将其传递给dfm

这就是我得到的:

> length(mymodels)
[1] 8
> mymodels[[1]]
Document-feature matrix of: 5,572 documents, 7,714 features (99.8% sparse).

希望这对您或其他使用此方法的人有帮助!