我有一个如下组织的数据集,包含23个日期的数据(如下所示是1个日期的数据和第二个日期的一部分 - 注意:标题是偏移的)。我想通过FrontBack和日期运行y~x(lm(y~x))的线性模型(即每个日期2 lm&s,1为Front,1为Back)。然后,我想在矩阵中汇总输出,其中每个列的斜率,截距和误差都列在单独的列中。这应该给我一个46行的矩阵(23个日期x前/后2级)。
Date FrontBack y x
20140916 Back 2234.580 2.253175
20140916 Back 2267.631 7.725422
20140916 Back 2246.668 14.414951
20140916 Back 2216.307 17.837861
20140916 Back 2214.225 15.484364
20140916 Front 2245.522 90.062102
20140916 Back 2219.565 12.326474
20140916 Front 2267.427 63.396137
20140916 Back 2213.286 7.861758
20140916 Front 2264.902 61.661650
20140916 Front 2256.183 70.124702
20140916 Back 2202.254 7.400539
20140916 Front 2241.997 44.826769
20140916 Back 2204.868 5.739663
20140916 Back 2209.424 2.165606
20140916 Front 2266.947 1.068334
20140917 Back 2237.199 2.190785
20140917 Back 2248.541 4.415886
20140917 Back 2260.041 8.724817
20140917 Back 2277.407 13.420694
20140917 Back 2278.414 14.789667
20140917 Front 2346.622 29.878672
20140917 Back 2268.111 15.120095
20140917 Front 2496.946 60.30390
答案 0 :(得分:0)
因为你只有两个类别,前后部分的子集是否合适?如果是这样,以下代码应该为您提供所需内容:
fake.y=runif(15*5,min=2200,max=2500) #make some fake data for testing
fake.FrontBack<-sample(c("Front","Back"),15*5,replace=T)
fake.Date<-sort(sample(seq(20140916,20140920),15*5,replace=T))
fake.data<-data.frame(Date=fake.Date,FrontBack=fake.FrontBack,y=fake.y,x=fake.y/120+runif(15*5)+1*(fake.FrontBack=="Back"))
# starting here use your data instead of fake.data
f<-subset(fake.data,FrontBack=="Front") # use subset to select front/back
b<-subset(fake.data,FrontBack=="Back")
f.fit<-lm(y~x,data=f)# regress front values
f.resid<-data.frame(Date=f$Date,resid=f.fit$residuals,int=f.fit$coefficients[1],slope=f.fit$coefficients[2],FrontBack="Front") # make dataframe of residuals
b.fit<-lm(y~x,data=b)# regress back values
b.resid<-data.frame(Date=b$Date,resid=b.fit$residuals,int=b.fit$coefficients[1],slope=b.fit$coefficients[2],FrontBack="Back") # make dataframe of residuals
all.resid<-rbind(f.resid,b.resid) # stick 'em together
head(all.resid) # should be what you wanted, residual error for all entires
# could be what you wanted, aggregate mean error for each date, front and back
ag<-aggregate(resid~.,data=all.resid,mean)
print(ag)
代码将吐出类似的内容,以显示原始数据中每行的模型错误残差:
Date resid int slope FrontBack
1 20140916 -47.91676 393.5386 97.52173 Front
3 20140916 52.01027 393.5386 97.52173 Front
4 20140916 52.58631 393.5386 97.52173 Front
5 20140916 -17.56038 393.5386 97.52173 Front
6 20140916 -21.85633 393.5386 97.52173 Front
9 20140916 44.97382 393.5386 97.52173 Front
如果您想要每天的平均误差,汇总的数据将是:
Date int slope FrontBack resid
1 20140916 393.5386 97.52173 Front 10.372821
2 20140917 393.5386 97.52173 Front -3.840699
3 20140918 393.5386 97.52173 Front -10.876092
4 20140919 393.5386 97.52173 Front -2.159973
5 20140920 393.5386 97.52173 Front 8.878526
6 20140916 212.6598 101.60367 Back 2.862525
7 20140917 212.6598 101.60367 Back -14.476662
8 20140918 212.6598 101.60367 Back 10.822712
9 20140919 212.6598 101.60367 Back -10.072473
10 20140920 212.6598 101.60367 Back 10.472516
每个日期都有单独的前后条目,子集和回归应该与您的数据相同,就像在此示例中一样
答案 1 :(得分:0)
例如:
# break data frame to list of data frames
df_list <- split(df1, f = paste(df1$Date, df1$FrontBack) )
# run lm models on each data frame
models <- lapply(df_list, function(dat) {
lm(y ~ x, data = dat)
})
# format result
res <-
do.call(
rbind,
lapply(models, function(x) {
coefs <- coef(summary(x))
c( intercept = coefs["(Intercept)", "Estimate"],
slope = coefs["x", "Estimate"],
p_value = coefs["x","Pr(>|t|)"],
InterceptP = coefs["(Intercept)","Pr(>|t|)"],
StdErrorX = coefs["x","Std. Error"],
StdErrorI = coefs["(Intercept)","Std. Error"],
r2 = x$r.squared,
AIC = AIC(x)
)
})
)