如何传递列表元素以在R中建模?

时间:2020-01-09 17:19:59

标签: r list

我对使用列表还是很陌生,所以如果这个问题听起来很愚蠢,我深表歉意。

我从最初的459,046位客户中创建了一个函数,该函数将基础拆分并存储在列表的多个元素中。

sampled_list <- baseSample(dataset = clv_df_cbs, sample.size = 10000, seed = 12345)

执行此功能( baseSample ),您将获得一个新的对象列表,其中包含互斥的客户组(每个组将由10,000个客户组成-除了最后一个可能较小的客户,取决于初始音量)

> sampled_list <- baseSample(dataset = clv_df_cbs, sample.size = 10000, seed = 12345)
[1] "Seed: 12345"
[1] "Total groups created: 46"
[1] "Group size: 10000"

在这种情况下,输出是存储在名为 sample_list 的对象中的46个元素的列表。

现在,我想将这46个元素中的每一个传递给BTYD模型,该模型将预测未来90天内的交易数量(根据输入的经验得出)。

之所以不能将完整的数据集传递给BTYD模型,是因为该模型大量使用mcmc,因此计算时间过长,导致模型无法提供任何输出。因此,我决定多次(在足够大的样本上)运行同一模型来生成预测,直到我设法将所有基础作为模型输入。

需要对每个元素执行的操作如下

# Estimate parameters for element1 of the list
pggg.draws1 <- pggg.mcmc.DrawParameters(element1, 
                                           mcmc = 1000, # number of MCMC steps
                                           burnin = 250, # number of initial MCMC steps which are discarded
                                           thin = 10, # only every thin-th MCMC step will be returned
                                           chains = 2, # number of MCMC chains to be run
                                           trace = 50) # print logging step every trace iteration

# generate draws for holdout period
pggg.xstar.draws1 <- mcmc.DrawFutureTransactions(element1, pggg.draws1)

# conditional expectations
element1$xstar.pggg <- apply(pggg.xstar.draws1, 2, mean)

# P(active)
element1$pactive.pggg <- mcmc.PActive(pggg.xstar.draws1)

# P(alive)
element1$palive.pggg <- mcmc.PAlive(pggg.draws1)

# show estimates for first few customers
head(element1[, c("x", "t.x", "x.star",
                            "xstar.pggg", "pactive.pggg", "palive.pggg")],50)

# report median cohort-level parameter estimates
round(apply(as.matrix(pggg.draws1$level_2), 2, median), 3)

# report mean over median individual-level parameter estimates
median.est1 <- sapply(pggg.draws1$level_1, function(draw) {
  apply(as.matrix(draw), 2, median)
})
round(apply(median.est1, 1, mean), 3)

理想情况下,输出应直接存储到新的data.frame中-这样我就可以检索ID和预测(以及最初包含在数据集中的其他内容)。

下面是一些可用于公开数据集中的模拟数据。

library(BTYDplus)
library(tidyverse)
data("groceryElog")
dataset<-elog2cbs(groceryElog, T.cal = "2006-12-01") 


# FUNCTION baseSample ####
baseSample <- function(dataset, sample.size, seed=NULL) {
  seed.value <- if(is.null(seed)) {
    as.numeric(format(Sys.Date(),"%Y"))*10000+as.numeric(format(Sys.Date(),"%m"))*100+as.numeric(format(Sys.Date(),"%d"))
  } else {
    seed
  }

  set.seed(seed.value)

  # RE-ORDER DATA FRAME (SAME LENGTH)
  data <- with(dataset, dataset[order(sample(cust, nrow(dataset))),])

  # BUILD A LIST OF DFs 
  set.sample.size <- sample.size
  data$cycles_group <- paste0("sample_", ceiling(1:nrow(data)/set.sample.size))

  df_list <- split(data, data$cycles_group)

  print(paste0("Seed: ", seed.value))
  print(paste0("Total groups created: ", length(unique(data$cycles_group))))
  print(paste0("Group size: ", set.sample.size))
  return(df_list)
  #print(df_list)
}

# ** OUTPUT: Base split in lists ####
sampled_list <- baseSample(dataset = dataset, sample.size = 100, seed = 12345)

谢谢

1 个答案:

答案 0 :(得分:3)

在基数R中,可以使用lapply在列表的元素上迭代一个函数,并使用这些迭代的结果返回一个新列表。使用示例代码生成名为sampled_list ...

的列表后
# turn the code for the operations you want to perform on each list element into a function,
# with a couple of minor tweaks
thingy <- function(i) {

  # Estimate parameters for element1 of the list
  pggg.draws1 <- pggg.mcmc.DrawParameters(i, 
                                          mcmc = 1000, # number of MCMC steps
                                          burnin = 250, # number of initial MCMC steps which are discarded
                                          thin = 10, # only every thin-th MCMC step will be returned
                                          chains = 2, # number of MCMC chains to be run
                                          trace = 50) # print logging step every trace iteration

  # generate draws for holdout period
  pggg.xstar.draws1 <- mcmc.DrawFutureTransactions(i, pggg.draws1)

  # conditional expectations
  i$xstar.pggg <- apply(pggg.xstar.draws1, 2, mean)

  # P(active)
  i$pactive.pggg <- mcmc.PActive(pggg.xstar.draws1)

  # P(alive)
  i$palive.pggg <- mcmc.PAlive(pggg.draws1)

  # show estimates for first few customers [commenting out for this iterated version]
  # head(element1[, c("x", "t.x", "x.star", "xstar.pggg", "pactive.pggg", "palive.pggg")],50)

  # report median cohort-level parameter estimates
  round(apply(as.matrix(pggg.draws1$level_2), 2, median), 3)

  # report mean over median individual-level parameter estimates
  median.est1 <- sapply(pggg.draws1$level_1, function(draw) {
    apply(as.matrix(draw), 2, median)
  })

  # get the bits you want in a named vector
  z <- round(apply(median.est1, 1, mean), 3)

  # convert that named vector of results into a one-row data frame to make collapsing easier
  data.frame(as.list(z))

}

# now use lapply to iterate that function over the elements of your list
results <- lapply(sampled_list, thingy)

# now bind the results into a data frame
boundresults <- do.call(rbind, results)

结果(花了一段时间):

              k lambda    mu          tau    z
sample_1  4.200  0.174 0.091      102.835 0.27
sample_10 3.117  0.149 0.214      128.143 0.29
sample_11 4.093  0.154 0.115      130.802 0.30
sample_12 4.191  0.142 0.053      114.108 0.33
sample_13 2.605  0.155 0.071      160.743 0.35
sample_14 9.196  0.210 0.084      111.747 0.36
sample_15 2.005  0.145 0.091      298.872 0.40
sample_16 2.454  0.111 0.019 78731750.121 0.70
sample_2  2.808  0.138 0.059      812.278 0.40
sample_3  4.327  0.166 0.116      559.318 0.42
sample_4  9.266  0.166 0.038      146.283 0.40
sample_5  3.277  0.157 0.073      105.915 0.33
sample_6  9.584  0.184 0.086      118.299 0.31
sample_7  4.244  0.189 0.118       54.945 0.23
sample_8  4.388  0.147 0.085      325.054 0.36
sample_9  7.898  0.181 0.052       83.892 0.33

您还可以将最后两个步骤合并为do.call(rbind, lapply(...))的一行。如果要使结果表中的行名称成为一列,则可以在创建该表后执行boundresults$sample <- row.names(boundresults)。而且,如果您不喜欢在环境中创建新对象,则可以将该函数放在对lapply的调用中,即lapply(sampled_list, function(i) { [your code] }).