循环回归Lm并替换NA

时间:2014-09-01 09:47:48

标签: loops replace lm

我有一个长度和高度为贝壳的数据集,但有些年份没有记录高度。我试图使用数据年份的线性回归,其中记录高度和长度,以便使用NA产生多年的高度。 除此之外,我希望它为我的每个评估区域做回归。

到目前为止,这就是我所拥有的;

 for(a in unique(all_data$Assessment_area)) { 
 r1 <- lm(Height_t2~Length_t2,data=all_data[!is.na(all_data$Height_t2)&all_data$Assessment_area==a,]) #Regression model for all shells with L&H
 print(a)
 print(r1)
 }

它给了我每个评估区域所需的输出(然后我将其插入到下面的代码中,即0.8871和0.5143,但目前是逐个)。 我的代码的下一部分是创建一个新列,如下所示,我每次输入生成的值。有没有办法将这些线条纳入上一个循环?

 all_data$Height_r1 <- all_data$Length_t2*0.8871+0.5143 #Apply regression relationship to new column
 all_data$Height_r1[!is.na(all_data$Height_t2)] <-all_data$Height_t2[!is.na(all_data$Height_t2)] #Add original heights  

任何帮助表示赞赏

2 个答案:

答案 0 :(得分:1)

您可以使用摘要功能上的$运算符访问线性回归的结果。在这种情况下,你会做

r1coefs <- summary(r1)$coefficients
intercept <- r1coeffs[1]
slope <- r1coeffs[2]

然后您可以按如下方式将这些内容合并到循环中:

for(a in unique(all_data$Assessment_area))

{ r1 <- lm(Height_t2~Length_t2,data=all_data[!is.na(all_data$Height_t2)&all_data$Assessment_area==a,]) #Regression model for all shells with L&H
print(a)
print(r1)
#access the linear regression coefficients and store them        
r1coefs <- summary(r1)$coefficients
intercept <- r1coeffs[1]
slope <- r1coeffs[2]

#use the stored regression coefficients on the new data
all_data$Height_r1 <- all_data$Length_t2*slope+intercept #Apply regression relationship to new column
all_data$Height_r1[!is.na(all_data$Height_t2)] <-all_data$Height_t2[!is.na(all_data$Height_t2)] #Add original heights  
}

答案 1 :(得分:0)

问题解决了:)在数据上使用存储的回归时,只需要一些额外的位来指定Assessment_area;

 for(a in unique(all_data$Assessment_area)) {
 r1 <- lm(Height_t2~Length_t2,data=all_data[!is.na(all_data$Height_t2)    &all_data$Assessment_area==a,]) 
#Regression model for all shells with L&H
 print(a)
 print(r1)
 #access the linear regression coefficients and store them        
 r1coeffs <- summary(r1)$coefficients
  intercept <- r1coeffs[1]
 slope <- r1coeffs[2]

   #use the stored regression coefficients on the new data

   all_data[all_data$Assessment_area==a,"Height_r1"] <- all_data   [all_data$Assessment_area==a,"Length_t2"]*slope+intercept #Apply regression relationship to new column

}

 #Add original heights         
all_data$Height_r1[!is.na(all_data$Height_t2)] <-all_data$Height_t2[!is.na(all_data$Height_t2)]