我希望在一个数据帧中估计模型,但是每个模型的公式都有一些“运动部分”,这些部分来自另一个数据帧。例如,假设我希望估算以下模型(我无法发布图片,也找不到键入乳胶方程的方法): mpg = a + b * log(w_1 * drat + w_2 * hp)
其中w_1和w_2是权重,例如为0.5或1。我使用expand.grid()创建权重的数据框,然后使用paste()或paste0()和变量将mutate()公式名称和权重的值,然后将其传递给lm()函数。
但是,估计的模型仅使用权数数据帧第一行中的公式。如果在估算模型之前使用group_by()即可解决此问题。
问题是-为什么?为什么第一个代码不起作用? group_by()在这里实现了什么呢?
library(tidyverse)
cars <- mtcars
w <- seq(from=0.5, to=1, by=0.5)
weights <- as_tibble(expand.grid(w1=w,w2=w))
#Doesn't work - the lm model is fit using the formula from the first row only
weights %>%
mutate(formula_weights = paste0("mpg~log(",w1,"*drat+",w2,"*hp)")) %>%
mutate(r2 = summary(lm(data=cars, formula = formula_weights))$r.squared)
#Does work - model is fit using the w1 and w2 values from each row (formula_weights)
weights %>%
mutate(formula_weights = paste0("mpg~log(",w1,"*drat+",w2,"*hp)")) %>%
group_by(formula_weights) %>%
mutate(r2 = summary(lm(data=cars, formula = formula_weights))$r.squared)
没有group_by()的输出:
# A tibble: 4 x 4
w1 w2 formula_weights r2
<dbl> <dbl> <chr> <dbl>
1 0.5 0.5 mpg~log(0.5*drat+0.5*hp) 0.715
2 1 0.5 mpg~log(1*drat+0.5*hp) 0.715
3 0.5 1 mpg~log(0.5*drat+1*hp) 0.715
4 1 1 mpg~log(1*drat+1*hp) 0.715
group_by()的输出:
# A tibble: 4 x 4
# Groups: formula_weights [4]
w1 w2 formula_weights r2
<dbl> <dbl> <chr> <dbl>
1 0.5 0.5 mpg~log(0.5*drat+0.5*hp) 0.715
2 1 0.5 mpg~log(1*drat+0.5*hp) 0.709
3 0.5 1 mpg~log(0.5*drat+1*hp) 0.718
4 1 1 mpg~log(1*drat+1*hp) 0.715
答案 0 :(得分:0)
我们可以添加rowwise
library(dplyr)
weights %>%
mutate(formula_weights = paste0("mpg~log(",w1,"*drat+",w2,"*hp)")) %>%
rowwise() %>%
mutate(r2 = summary(lm(data=cars, formula = formula_weights))$r.squared)
#Source: local data frame [4 x 4]
#Groups: <by row>
# A tibble: 4 x 4
# w1 w2 formula_weights r2
# <dbl> <dbl> <chr> <dbl>
#1 0.5 0.5 mpg~log(0.5*drat+0.5*hp) 0.715
#2 1 0.5 mpg~log(1*drat+0.5*hp) 0.709
#3 0.5 1 mpg~log(0.5*drat+1*hp) 0.718
#4 1 1 mpg~log(1*drat+1*hp) 0.715
或使用map
library(purrr)
weights %>%
mutate(r2 = map_dbl(paste0("mpg~log(",w1,"*drat+",w2,"*hp)"), ~
summary(lm(data = cars, formula = .x))$r.squared))
# A tibble: 4 x 3
# w1 w2 r2
# <dbl> <dbl> <dbl>
#1 0.5 0.5 0.715
#2 1 0.5 0.709
#3 0.5 1 0.718
#4 1 1 0.715
答案 1 :(得分:0)
在您的变异中使用sapply。摘要/ lm未向量化
weights %>%
mutate(formula_weights = paste0("mpg~log(",w1,"*drat+",w2,"*hp)")) %>%
mutate(r2 = sapply(formula_weights,
function(fw) summary(lm(data=cars, formula =))$r.squared))