我想使用for循环或R中的函数循环数据框中的变量。我编写了以下代码(它不起作用):
y <- c(0,0,1,1,0,1,0,1,1,1)
var1 <- c("a","a","a","b","b","b","c","c","c","c")
var2 <- c("m","m","n","n","n","n","o","o","o","m")
mydata <- data.frame(y,var1,var2)
myfunction <- function(v){
regressionresult <- lm(y ~ v, data = mydata)
summary(regressionresult)
}
myfunction("var1")
当我尝试运行时,我收到错误消息:
Error in model.frame.default(formula = y ~ v, data = mydata, drop.unused.levels = TRUE) :
variable lengths differ (found for 'v')
我不认为这是数据的问题,但是我如何引用变量名称,因为以下代码产生了所需的回归结果(对于我想要循环的一个变量):
regressionresult <- lm(y ~ var1, data = mydata)
summary(regressionresult)
如何修复函数,或将变量名放在循环中?
[我也尝试循环变量名称,但遇到与函数类似的问题:
for(v in c("var1","var2")){
regressionresult <- lm(y ~ v, data = mydata)
summary(regressionresult)
}
运行此循环时,会产生错误:
Error in model.frame.default(formula = y ~ v, data = mydata, drop.unused.levels = TRUE) :
variable lengths differ (found for 'v')
感谢您的帮助!
答案 0 :(得分:0)
我们可以使用paste
创建公式,将其传递到lm
myfunction <- function(v){
regressionresult <- lm(paste0('y ~', v), data = mydata)
summary(regressionresult)
}
out1 <- myfunction("var1")
或使用glue::glue
myfunction <- function(v){
regressionresult <- lm(glue::glue('y ~ {v}'), data = mydata)
summary(regressionresult)
}
myfunction("var1")
答案 1 :(得分:0)
您可以使用tidyverse
中的函数来处理整洁数据并将模型应用于不同的公式。
y <- c(0,0,1,1,0,1,0,1,1,1)
var1 <- c("a","a","a","b","b","b","c","c","c","c")
var2 <- c("m","m","n","n","n","n","o","o","o","m")
library(tidyverse)
mydata <- data_frame(y,var1,var2)
res <- mydata %>%
# get data in long format - tidy format
gather("var_type", "value", -y) %>%
# we want one model per var_type
nest(-var_type) %>%
# apply lm on each data
mutate(
regressionresult = map(data, ~lm(y ~ value, data = .x))
)
res
#> # A tibble: 2 x 3
#> var_type data regressionresult
#> <chr> <list> <list>
#> 1 var1 <tibble [10 x 2]> <S3: lm>
#> 2 var2 <tibble [10 x 2]> <S3: lm>
summary(res$regressionresult[[1]])
#>
#> Call:
#> lm(formula = y ~ value, data = .x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.7500 -0.3333 0.2500 0.3125 0.6667
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.3333 0.3150 1.058 0.325
#> valueb 0.3333 0.4454 0.748 0.479
#> valuec 0.4167 0.4167 1.000 0.351
#>
#> Residual standard error: 0.5455 on 7 degrees of freedom
#> Multiple R-squared: 0.1319, Adjusted R-squared: -0.1161
#> F-statistic: 0.532 on 2 and 7 DF, p-value: 0.6094
Broom包可以帮助您处理结果
library(broom)
#> Warning: le package 'broom' a été compilé avec la version R 3.4.4
res <- res %>%
mutate(tidy_summary = map(regressionresult, broom::tidy))
res
#> # A tibble: 2 x 4
#> var_type data regressionresult tidy_summary
#> <chr> <list> <list> <list>
#> 1 var1 <tibble [10 x 2]> <S3: lm> <data.frame [3 x 5]>
#> 2 var2 <tibble [10 x 2]> <S3: lm> <data.frame [3 x 5]>
您可以获得其中一个摘要
res$tidy_summary[[1]]
#> term estimate std.error statistic p.value
#> 1 (Intercept) 0.3333333 0.3149704 1.0583005 0.3250657
#> 2 valueb 0.3333333 0.4454354 0.7483315 0.4786436
#> 3 valuec 0.4166667 0.4166667 1.0000000 0.3506167
或者不需要使用data.frame来处理。
res %>%
unnest(tidy_summary)
#> # A tibble: 6 x 6
#> var_type term estimate std.error statistic p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 var1 (Intercept) 0.333 0.315 1.06 0.325
#> 2 var1 valueb 0.333 0.445 0.748 0.479
#> 3 var1 valuec 0.417 0.417 1.000 0.351
#> 4 var2 (Intercept) 0.333 0.315 1.06 0.325
#> 5 var2 valuen 0.417 0.417 1 0.351
#> 6 var2 valueo 0.333 0.445 0.748 0.479
感兴趣的功能是来自[nest
] [http://tidyr.tidyverse.org/)的unnest
和tidyr
,可以轻松创建列表列,map
来自{{3}允许迭代列表并应用purrr
包中的函数(此处为lm
)和tidy
,该函数提供从模型中整理结果的函数(汇总结果,预测结果,。 ..)
此处未使用,但知道broom
包有助于在建模时进行管道处理。