我正在R中处理电视零售数据集,并希望将需要重复使用的步骤放入函数中。
这包括检查VIF并将其返回,运行STEP算法以确定最佳模型,然后使用STEP的结果进行显示。
主要问题是错误消息
Error in eval(predvars, data, env) : object 'Hour' not found
似乎出现在step()调用中。
Regression <- function(data, dep_var, features) {
lin.null = lm(paste(dep_var,'~ 1', sep = ''), data= data)
lin.full = lm(paste(dep_var,'~', paste(features, collapse='+'), sep = ''), data = data)
vif(lin.full)
opt = step(lin.null, scope = list(lower = lin.null, upper = lin.full), direction = "forward")
step_opt = opt$call
stargazer(step_opt, type = 'text')
}
dep_var = 'imp'
feat = c('Hour', 'grp')
paste(dep_var,'~', paste(feat, collapse='+'), sep = '')
Regression(comb_a, 'imp', feat)
最终结果应该显示每个变量的VIF值以及STEP优化回归的观星者输出。
编辑1:
comb_a是回归应采用的输入数据 dput()输出如下:
# comb_a
structure(list(Day = structure(c(1483833600, 1483833600, 1483833600,
1483833600, 1483833600, 1483833600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Hour = c(0, 1, 6, 7, 8, 9), Model = c("Model A",
"Model A", "Model A", "Model A", "Model A", "Model A"), tv_count = c(5L,
8L, 4L, 9L, 11L, 8L), grp_abs = c(55500, 8308, 19026, 12184,
10141, 113225), grp = c(0.22, 0.03, 0.07, 0.05, 0.04, 0.45),
sum_duration = c(150, 240, 120, 270, 330, 240), grp_per_second = c(370,
34.6166666666667, 158.55, 45.1259259259259, 30.730303030303,
471.770833333333), hours_since = c(NA, 1, 5, 1, 1, 1), camp_count = c(2L,
2L, 2L, 2L, 3L, 4L), imp = c(528, 319, 97, 182, 327, 785),
clicks = c(28, 15, 6, 13, 29, 53), leads = c(0, 0, 0, 0,
0, 1)), .Names = c("Day", "Hour", "Model", "tv_count", "grp_abs",
"grp", "sum_duration", "grp_per_second", "hours_since", "camp_count",
"imp", "clicks", "leads"), row.names = c(NA, -6L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = c("Day", "Hour"), drop = TRUE, indices = list(
0L, 1L, 2L, 3L, 4L, 5L), group_sizes = c(1L, 1L, 1L, 1L,
1L, 1L), biggest_group_size = 1L, labels = structure(list(Day = structure(c(1483833600,
1483833600, 1483833600, 1483833600, 1483833600, 1483833600), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Hour = c(0, 1, 6, 7, 8, 9)), row.names = c(NA,
-6L), class = "data.frame", vars = c("Day", "Hour"), drop = TRUE, .Names = c("Day",
"Hour")))
期望的输出将是:(数字仅用于表示)
> vif(lin.full)
Hour grp sum_duration grp_per_second hours_since camp_count
2.979362 4.981504 2.290328 3.279818 1.013725 1.110823
imp clicks
7.471457 9.244811
> stargazer(step_opt, type = 'text')
===============================================
Dependent variable:
---------------------------
leads
-----------------------------------------------
clicks 0.005***
(0.0004)
camp_count 0.040*
(0.024)
Constant -0.107
(0.098)
-----------------------------------------------
Observations 898
R2 0.181
Adjusted R2 0.179
Residual Std. Error 0.772 (df = 895)
F Statistic 98.901*** (df = 2; 895)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01