STEP Algorithm在自定义函数中找不到变量

时间:2019-04-22 12:50:43

标签: r function regression

我正在R中处理电视零售数据集,并希望将需要重复使用的步骤放入函数中。

这包括检查VIF并将其返回,运行STEP算法以确定最佳模型,然后使用STEP的结果进行显示。

主要问题是错误消息

Error in eval(predvars, data, env) : object 'Hour' not found

似乎出现在step()调用中。

Regression <- function(data, dep_var, features) {
  lin.null = lm(paste(dep_var,'~ 1', sep = ''), data= data)
  lin.full = lm(paste(dep_var,'~', paste(features, collapse='+'), sep = ''), data = data)

  vif(lin.full)

  opt = step(lin.null, scope = list(lower = lin.null, upper = lin.full), direction = "forward")

step_opt = opt$call

stargazer(step_opt, type = 'text')
}

dep_var = 'imp'
feat = c('Hour', 'grp')
paste(dep_var,'~', paste(feat, collapse='+'), sep = '')

Regression(comb_a, 'imp', feat)

最终结果应该显示每个变量的VIF值以及STEP优化回归的观星者输出。

编辑1:

comb_a是回归应采用的输入数据 dput()输出如下:

# comb_a 
structure(list(Day = structure(c(1483833600, 1483833600, 1483833600, 
1483833600, 1483833600, 1483833600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Hour = c(0, 1, 6, 7, 8, 9), Model = c("Model A", 
"Model A", "Model A", "Model A", "Model A", "Model A"), tv_count = c(5L, 
8L, 4L, 9L, 11L, 8L), grp_abs = c(55500, 8308, 19026, 12184, 
10141, 113225), grp = c(0.22, 0.03, 0.07, 0.05, 0.04, 0.45), 
    sum_duration = c(150, 240, 120, 270, 330, 240), grp_per_second = c(370, 
    34.6166666666667, 158.55, 45.1259259259259, 30.730303030303, 
    471.770833333333), hours_since = c(NA, 1, 5, 1, 1, 1), camp_count = c(2L, 
    2L, 2L, 2L, 3L, 4L), imp = c(528, 319, 97, 182, 327, 785), 
    clicks = c(28, 15, 6, 13, 29, 53), leads = c(0, 0, 0, 0, 
    0, 1)), .Names = c("Day", "Hour", "Model", "tv_count", "grp_abs", 
"grp", "sum_duration", "grp_per_second", "hours_since", "camp_count", 
"imp", "clicks", "leads"), row.names = c(NA, -6L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), vars = c("Day", "Hour"), drop = TRUE, indices = list(
    0L, 1L, 2L, 3L, 4L, 5L), group_sizes = c(1L, 1L, 1L, 1L, 
1L, 1L), biggest_group_size = 1L, labels = structure(list(Day = structure(c(1483833600, 
1483833600, 1483833600, 1483833600, 1483833600, 1483833600), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Hour = c(0, 1, 6, 7, 8, 9)), row.names = c(NA, 
-6L), class = "data.frame", vars = c("Day", "Hour"), drop = TRUE, .Names = c("Day", 
"Hour")))

期望的输出将是:(数字仅用于表示)

> vif(lin.full)
          Hour            grp   sum_duration grp_per_second    hours_since     camp_count 
      2.979362       4.981504       2.290328       3.279818       1.013725       1.110823 
           imp         clicks 
      7.471457       9.244811 

> stargazer(step_opt, type = 'text')
===============================================
                        Dependent variable:    
                    ---------------------------
                               leads           
-----------------------------------------------
clicks                       0.005***          
                             (0.0004)          

camp_count                    0.040*           
                              (0.024)          

Constant                      -0.107           
                              (0.098)          

-----------------------------------------------
Observations                    898            
R2                             0.181           
Adjusted R2                    0.179           
Residual Std. Error      0.772 (df = 895)      
F Statistic           98.901*** (df = 2; 895)  
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

0 个答案:

没有答案