我有一个长度和高度为贝壳的数据集,但有些年份没有记录高度。我试图使用数据年份的线性回归,其中记录高度和长度,以便使用NA产生多年的高度。 除此之外,我希望它为我的每个评估区域做回归。
到目前为止,这就是我所拥有的;
for(a in unique(all_data$Assessment_area)) {
r1 <- lm(Height_t2~Length_t2,data=all_data[!is.na(all_data$Height_t2)&all_data$Assessment_area==a,]) #Regression model for all shells with L&H
print(a)
print(r1)
}
它给了我每个评估区域所需的输出(然后我将其插入到下面的代码中,即0.8871和0.5143,但目前是逐个)。 我的代码的下一部分是创建一个新列,如下所示,我每次输入生成的值。有没有办法将这些线条纳入上一个循环?
all_data$Height_r1 <- all_data$Length_t2*0.8871+0.5143 #Apply regression relationship to new column
all_data$Height_r1[!is.na(all_data$Height_t2)] <-all_data$Height_t2[!is.na(all_data$Height_t2)] #Add original heights
任何帮助表示赞赏
答案 0 :(得分:1)
您可以使用摘要功能上的$运算符访问线性回归的结果。在这种情况下,你会做
r1coefs <- summary(r1)$coefficients
intercept <- r1coeffs[1]
slope <- r1coeffs[2]
然后您可以按如下方式将这些内容合并到循环中:
for(a in unique(all_data$Assessment_area))
{ r1 <- lm(Height_t2~Length_t2,data=all_data[!is.na(all_data$Height_t2)&all_data$Assessment_area==a,]) #Regression model for all shells with L&H
print(a)
print(r1)
#access the linear regression coefficients and store them
r1coefs <- summary(r1)$coefficients
intercept <- r1coeffs[1]
slope <- r1coeffs[2]
#use the stored regression coefficients on the new data
all_data$Height_r1 <- all_data$Length_t2*slope+intercept #Apply regression relationship to new column
all_data$Height_r1[!is.na(all_data$Height_t2)] <-all_data$Height_t2[!is.na(all_data$Height_t2)] #Add original heights
}
答案 1 :(得分:0)
问题解决了:)在数据上使用存储的回归时,只需要一些额外的位来指定Assessment_area;
for(a in unique(all_data$Assessment_area)) {
r1 <- lm(Height_t2~Length_t2,data=all_data[!is.na(all_data$Height_t2) &all_data$Assessment_area==a,])
#Regression model for all shells with L&H
print(a)
print(r1)
#access the linear regression coefficients and store them
r1coeffs <- summary(r1)$coefficients
intercept <- r1coeffs[1]
slope <- r1coeffs[2]
#use the stored regression coefficients on the new data
all_data[all_data$Assessment_area==a,"Height_r1"] <- all_data [all_data$Assessment_area==a,"Length_t2"]*slope+intercept #Apply regression relationship to new column
}
#Add original heights
all_data$Height_r1[!is.na(all_data$Height_t2)] <-all_data$Height_t2[!is.na(all_data$Height_t2)]