如何将数据框的特定值添加到基于另一个数据框的线性回归中

时间:2015-11-18 16:36:19

标签: r dataframe linear-regression lm linear

我尝试从一个数据框中提取特定值(在我的示例中为df,特定值为"红色"来自第一列),并将其用作基于线性回归的自变量在另一个具有此值作为列的数据框。我将此值保存为字符,但我收到一个错误(desc。在下面)。如何将此值添加到基于另一个数据帧的lm函数中?

df <- read.table(text = " color birds    wolfs     
                  red           9         7 
                  red           8         4 
                  red           2         8 
                  red           2         3 
                  black         8         3 
                  black         1         2 
                  black         7         16 
                  black         1         5 
                  black         17        7 
                  black         8         7 
                  black         2         7 
                  green         20        3 
                  green         6         3 
                  green         1         1 
                  green         3         11 
                  green         30         1  ",header = TRUE)

df1 <- read.table(text = " red birds    wolfs     
                   10         9         7 
                   8          8         4 
                   11         2         8 
                   8          2         3 
                   3          8         3 
                   4          1         2 
                   8          7         16 
                   9          1         5 
                   10         17        7 
                   8          8         7 
                   6          2         7     ",header = TRUE)
# I extracted the desired value than I added it to the new lm function and got an error:
 df[1,1]
[1] red
Levels: black green red
lm<-lm(birds~df[1,1],data=df1)
Error in model.frame.default(formula = birds ~ df[1, 1], data = df1, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'df[1, 1]')
# I also tried to change the value into character :
b<-as.character(df[1,1])
b
[1] "red"
lm<-lm(birds~ b ,data=df1)
but got the same error:Error in model.frame.default(formula = birds ~ b, data = df1, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'b')

2 个答案:

答案 0 :(得分:2)

我想你可以使用

onValue<-as.character(df[1,1]) # "red"
reg<-lm(birds~eval(as.symbol(onValue)),data=df1) # regression 

另外,不要将回归分配给名为lm的对象,因为它是函数,可能会造成混淆。

eval(as.symbol(onValue))告诉R在名为df1的{​​{1}}列上运行回归(在本例中为“红色”)

答案 1 :(得分:2)

如果你想要一个不同的方法,我发现update非常适合这样的任务:

#create a formula outside of lm. This can be a simple one against
#the intercept or one that you already use
form <- birds ~ 1

#then add the new variable using paste + update 
#the . ~ . says include everything before and after the tilde ~
#that existed in original formula  
form <- update(form, paste('. ~ . + ', df[1,1]))
#> form
#birds ~ red

lm <- lm(form, data=df1)

Call:
lm(formula = form, data = df1)

Coefficients:
(Intercept)          red  
      2.339        0.462