从lmList对象中提取r.squared值

时间:2015-06-12 20:49:40

标签: r prediction lm summary predict

如果这是一个愚蠢的问题,我很抱歉,但我一直试图从lmList模型对象中提取r平方值几天没有成功。我正在运行自定义引导函数,以允许我交叉验证我的结果,并且我能够成功地提取系数和RMSE。

words = ["one", "two", "two", "three", "three", "three"]
wordPairsRDD = sc.parallelize(words).map(lambda word : (word, 1))

wordCountsWithGroup = wordPairsRDD
    .groupByKey()
    .map(lambda t: (t[0], sum(t[1])))
    .collect()

但是当我尝试运行过去曾运行过的函数时,在我添加r平方计算之前,我收到以下错误:

GroupedBootStrampling <- function(slice, root, data, iters, met) {

  print('Initializing Modeling')
  root <- as.vector(root[slice, ])

  coefs <- list()
  RMSE <- list()
  r_sq <- list()
  mods <- list()

  for (i in 1:iters){

    print('Splitting Data into training and testing sets')
    comb <- splitter(data,0.8)
    train <- comb[[1]]
    test <- comb[[2]]
    print('Data sucessfully split')

    # Create var functional form to pass to lmList model object
    train$var <- train$cpc**(1/root[, 1])*train$cost**(1/root[, 2])
    test$var <- test$cpc**(1/root[, 1])*test$cost**(1/root[, 2])
    train[,'temp'] <- train[,met]

    # Use lmList to train and test the models
    print('Creating Models for each campaign')
    model <- lmList(temp ~ -1 + var + I(var*Mon) + I(var*Tues) + I(var*Wed) +             I(var*Thurs) + I(var*Fri) + I(var*Sat) | grouper , data = train)
    #model <- as.formula("temp ~ -1 + var + I(var*Mon) + I(var*Tues) + I(var*Wed) + I(var*Thurs) + I(var*Fri) + I(var*Sat)")
    #model <- ridgeList(train, model, "grouper")
    mods[[i]] <- model

    # Predict clicks based on the Test dataset
    print('Generating predictions')
    preds <- predict(model, test, se.fit = T)

    preds$RMSE <- as.numeric((test[,met] - preds$fit)**2)

    # Predict RMSE off by subtracting the actual clicks from the predicted clicks for the test dataset
    RMSE[[i]] <- as.data.frame(preds %>% group_by(grouper) %>% summarize(RMSE = (mean(RMSE)**0.5)))

    # extract the coefficients from the model
    coefs[[i]] <- data.frame(groups = unique(train$grouper), coef(model, augFrame = T, data = train, which = 'grouper'))

    # extract the r.squared
    r_sq[[i]] <- summary(model)$r.squared

  }

  # Pick out average RMSE and Average coefs
  avg_RMSE <- do.call(rbind,RMSE)
  avg_coefs <- do.call(rbind,coefs)
  avg_r.sq <- do.call(rbind,r_sq)
  avg_coefs$RMSE <- avg_RMSE$RMSE
  avg_coefs$r_sq <- avg_r.sq


  # Create a dataframe of deisred output to be returned from the function
  print('Collecting Results')
  outs <- avg_coefs %>% group_by(groups) %>% summarize(var_coef = mean(var),
                                                         int_mon_coef = mean(I.var...Mon.),
                                                         int_tues_coef = mean(I.var...Tues.),
                                                         int_wed_coef = mean(I.var...Wed.),
                                                         int_thurs_coef = mean(I.var...Thurs.),
                                                         int_fri_coef = mean(I.var...Fri.),
                                                         int_sat_coef = mean(I.var...Sat.),
                                                         RMSE = mean(RMSE),
                                                         r_sq = mean(r_sq))

  outs$cpc_root <- root[,1]
  outs$cost_root <- root[,2]


  outs <- na_killer(outs)
  #   outs <- pred_optimizer2(outs, data)

  print('Loop Complete')
  return(outs)
}

当追溯时指向r.squared计算。我已经尝试了下标的各种组合,我似乎仍然无法得到它。如果我在迭代循环之外生成一个lmList,我可以使用Error in `[<-`(`*tmp*`, use, use, ii, value = lst[[ii]]) : subscript out of bounds ,它可以正常工作。

如果有人有任何想法,我真的很感激!

0 个答案:

没有答案