如果这是一个愚蠢的问题,我很抱歉,但我一直试图从lmList模型对象中提取r平方值几天没有成功。我正在运行自定义引导函数,以允许我交叉验证我的结果,并且我能够成功地提取系数和RMSE。
words = ["one", "two", "two", "three", "three", "three"]
wordPairsRDD = sc.parallelize(words).map(lambda word : (word, 1))
wordCountsWithGroup = wordPairsRDD
.groupByKey()
.map(lambda t: (t[0], sum(t[1])))
.collect()
但是当我尝试运行过去曾运行过的函数时,在我添加r平方计算之前,我收到以下错误:
GroupedBootStrampling <- function(slice, root, data, iters, met) {
print('Initializing Modeling')
root <- as.vector(root[slice, ])
coefs <- list()
RMSE <- list()
r_sq <- list()
mods <- list()
for (i in 1:iters){
print('Splitting Data into training and testing sets')
comb <- splitter(data,0.8)
train <- comb[[1]]
test <- comb[[2]]
print('Data sucessfully split')
# Create var functional form to pass to lmList model object
train$var <- train$cpc**(1/root[, 1])*train$cost**(1/root[, 2])
test$var <- test$cpc**(1/root[, 1])*test$cost**(1/root[, 2])
train[,'temp'] <- train[,met]
# Use lmList to train and test the models
print('Creating Models for each campaign')
model <- lmList(temp ~ -1 + var + I(var*Mon) + I(var*Tues) + I(var*Wed) + I(var*Thurs) + I(var*Fri) + I(var*Sat) | grouper , data = train)
#model <- as.formula("temp ~ -1 + var + I(var*Mon) + I(var*Tues) + I(var*Wed) + I(var*Thurs) + I(var*Fri) + I(var*Sat)")
#model <- ridgeList(train, model, "grouper")
mods[[i]] <- model
# Predict clicks based on the Test dataset
print('Generating predictions')
preds <- predict(model, test, se.fit = T)
preds$RMSE <- as.numeric((test[,met] - preds$fit)**2)
# Predict RMSE off by subtracting the actual clicks from the predicted clicks for the test dataset
RMSE[[i]] <- as.data.frame(preds %>% group_by(grouper) %>% summarize(RMSE = (mean(RMSE)**0.5)))
# extract the coefficients from the model
coefs[[i]] <- data.frame(groups = unique(train$grouper), coef(model, augFrame = T, data = train, which = 'grouper'))
# extract the r.squared
r_sq[[i]] <- summary(model)$r.squared
}
# Pick out average RMSE and Average coefs
avg_RMSE <- do.call(rbind,RMSE)
avg_coefs <- do.call(rbind,coefs)
avg_r.sq <- do.call(rbind,r_sq)
avg_coefs$RMSE <- avg_RMSE$RMSE
avg_coefs$r_sq <- avg_r.sq
# Create a dataframe of deisred output to be returned from the function
print('Collecting Results')
outs <- avg_coefs %>% group_by(groups) %>% summarize(var_coef = mean(var),
int_mon_coef = mean(I.var...Mon.),
int_tues_coef = mean(I.var...Tues.),
int_wed_coef = mean(I.var...Wed.),
int_thurs_coef = mean(I.var...Thurs.),
int_fri_coef = mean(I.var...Fri.),
int_sat_coef = mean(I.var...Sat.),
RMSE = mean(RMSE),
r_sq = mean(r_sq))
outs$cpc_root <- root[,1]
outs$cost_root <- root[,2]
outs <- na_killer(outs)
# outs <- pred_optimizer2(outs, data)
print('Loop Complete')
return(outs)
}
当追溯时指向r.squared计算。我已经尝试了下标的各种组合,我似乎仍然无法得到它。如果我在迭代循环之外生成一个lmList,我可以使用Error in `[<-`(`*tmp*`, use, use, ii, value = lst[[ii]]) :
subscript out of bounds
,它可以正常工作。
如果有人有任何想法,我真的很感激!