循环计算线性模型并将结果写入矩阵

时间:2015-02-24 00:51:31

标签: r loops linear-regression

我有一个如下组织的数据集,包含23个日期的数据(如下所示是1个日期的数据和第二个日期的一部分 - 注意:标题是偏移的)。我想通过FrontBack和日期运行y~x(lm(y~x))的线性模型(即每个日期2 lm&s,1为Front,1为Back)。然后,我想在矩阵中汇总输出,其中每个列的斜率,截距和误差都列在单独的列中。这应该给我一个46行的矩阵(23个日期x前/后2级)。

Date FrontBack     y            x    
20140916    Back    2234.580    2.253175    
20140916    Back    2267.631    7.725422    
20140916    Back    2246.668    14.414951    
20140916    Back    2216.307    17.837861   
20140916    Back    2214.225    15.484364   
20140916    Front   2245.522    90.062102   
20140916    Back    2219.565    12.326474   
20140916    Front   2267.427    63.396137   
20140916    Back    2213.286    7.861758   
20140916    Front   2264.902    61.661650   
20140916    Front   2256.183    70.124702   
20140916    Back    2202.254    7.400539   
20140916    Front   2241.997    44.826769   
20140916    Back    2204.868    5.739663   
20140916    Back    2209.424    2.165606   
20140916    Front   2266.947    1.068334   
20140917    Back    2237.199    2.190785   
20140917    Back    2248.541    4.415886   
20140917    Back    2260.041    8.724817   
20140917    Back    2277.407    13.420694   
20140917    Back    2278.414    14.789667   
20140917    Front   2346.622    29.878672   
20140917    Back    2268.111    15.120095   
20140917    Front   2496.946    60.30390

2 个答案:

答案 0 :(得分:0)

因为你只有两个类别,前后部分的子集是否合适?如果是这样,以下代码应该为您提供所需内容:

fake.y=runif(15*5,min=2200,max=2500) #make some fake data for testing
fake.FrontBack<-sample(c("Front","Back"),15*5,replace=T)
fake.Date<-sort(sample(seq(20140916,20140920),15*5,replace=T))
fake.data<-data.frame(Date=fake.Date,FrontBack=fake.FrontBack,y=fake.y,x=fake.y/120+runif(15*5)+1*(fake.FrontBack=="Back"))

# starting here use your data instead of fake.data
f<-subset(fake.data,FrontBack=="Front") # use subset to select front/back
b<-subset(fake.data,FrontBack=="Back")
f.fit<-lm(y~x,data=f)# regress front values
f.resid<-data.frame(Date=f$Date,resid=f.fit$residuals,int=f.fit$coefficients[1],slope=f.fit$coefficients[2],FrontBack="Front") # make dataframe of residuals
b.fit<-lm(y~x,data=b)# regress back values
b.resid<-data.frame(Date=b$Date,resid=b.fit$residuals,int=b.fit$coefficients[1],slope=b.fit$coefficients[2],FrontBack="Back") # make dataframe of residuals
all.resid<-rbind(f.resid,b.resid) # stick 'em together
head(all.resid) # should be what you wanted, residual error for all entires

# could be what you wanted, aggregate mean error for each date, front and back
ag<-aggregate(resid~.,data=all.resid,mean)
print(ag)

代码将吐出类似的内容,以显示原始数据中每行的模型错误残差:

      Date     resid      int    slope FrontBack
1 20140916 -47.91676 393.5386 97.52173     Front
3 20140916  52.01027 393.5386 97.52173     Front
4 20140916  52.58631 393.5386 97.52173     Front
5 20140916 -17.56038 393.5386 97.52173     Front
6 20140916 -21.85633 393.5386 97.52173     Front
9 20140916  44.97382 393.5386 97.52173     Front

如果您想要每天的平均误差,汇总的数据将是:

       Date      int     slope FrontBack      resid
1  20140916 393.5386  97.52173     Front  10.372821
2  20140917 393.5386  97.52173     Front  -3.840699
3  20140918 393.5386  97.52173     Front -10.876092
4  20140919 393.5386  97.52173     Front  -2.159973
5  20140920 393.5386  97.52173     Front   8.878526
6  20140916 212.6598 101.60367      Back   2.862525
7  20140917 212.6598 101.60367      Back -14.476662
8  20140918 212.6598 101.60367      Back  10.822712
9  20140919 212.6598 101.60367      Back -10.072473
10 20140920 212.6598 101.60367      Back  10.472516

每个日期都有单独的前后条目,子集和回归应该与您的数据相同,就像在此示例中一样

答案 1 :(得分:0)

例如:

# break data frame to list of data frames
df_list <- split(df1, f = paste(df1$Date, df1$FrontBack) )

# run lm models on each data frame
models <- lapply(df_list, function(dat) {
  lm(y ~ x, data = dat)
})

# format result
res <- 
  do.call(
    rbind,
    lapply(models, function(x) {
      coefs <- coef(summary(x))
      c( intercept = coefs["(Intercept)", "Estimate"],
         slope = coefs["x", "Estimate"],
         p_value = coefs["x","Pr(>|t|)"], 
         InterceptP = coefs["(Intercept)","Pr(>|t|)"], 
         StdErrorX = coefs["x","Std. Error"],
         StdErrorI = coefs["(Intercept)","Std. Error"],
         r2 = x$r.squared,
         AIC = AIC(x)
        )
    })
  )