如何从R中的lm命令中提取表格摘要数据

时间:2012-09-19 13:59:33

标签: r format extract lm

我的数据结构如下:

group_id, months_from_start, perc_total_downloads, experience_ratio
1             1                    1.2                4
1             2                    1.7                6
…
235           1                    6.7                3
235           2                   18                  8
…

大约有300个组,每个组有70个左右的连续数据元素。

我发布了以下脚本来估计每个组的二阶多项式。

s.1<-lm(xts(s[s$group_id == 1,][,-2], order.by=as.Date(s[s$group_id == 1,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1,][,-2], order.by=as.Date(s[s$group_id == 1,][,2]))$months_from_start, 2, raw=TRUE))
s.235<-lm(xts(s[s$group_id == 235,][,-2], order.by=as.Date(s[s$group_id == 235,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 235,][,-2], order.by=as.Date(s[s$group_id == 235,][,2]))$months_from_start, 2, raw=TRUE))
s.599<-lm(xts(s[s$group_id == 599,][,-2], order.by=as.Date(s[s$group_id == 599,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 599,][,-2], order.by=as.Date(s[s$group_id == 599,][,2]))$months_from_start, 2, raw=TRUE))
s.1111<-lm(xts(s[s$group_id == 1111,][,-2], order.by=as.Date(s[s$group_id == 1111,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1111,][,-2], order.by=as.Date(s[s$group_id == 1111,][,2]))$months_from_start, 2, raw=TRUE))
s.1537<-lm(xts(s[s$group_id == 1537,][,-2], order.by=as.Date(s[s$group_id == 1537,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1537,][,-2], order.by=as.Date(s[s$group_id == 1537,][,2]))$months_from_start, 2, raw=TRUE))

对于这些新变量中的每一个,我都可以发布一个摘要声明来揭示有趣的信息:

> summary(s.44375)

Call:
lm(formula = xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 
44375, ][, 2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 
44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, 
][, 2]))$months_from_start, 2, raw = TRUE))


Residuals:
       Min         1Q     Median         3Q        Max 
-0.0064004 -0.0017315 -0.0002022  0.0012087  0.0078436 


Coefficients: (3 not defined because of singularities)
                                                                                                                                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)                                                                                                                       1.993e-03  1.137e-03   1.753    0.084 .  
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)1.0  7.769e-04  6.707e-05  11.583   <2e-16 ***
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)2.0 -9.258e-06  8.404e-07 -11.017   <2e-16 ***
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)0.1         NA         NA      NA       NA    
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)1.1         NA         NA      NA       NA    
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)0.2         NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 


Residual standard error: 0.002866 on 69 degrees of freedom
Multiple R-squared: 0.6619,Adjusted R-squared: 0.6521 
F-statistic: 67.53 on 2 and 69 DF,  p-value: < 2.2e-16 

出于我的目的,我需要将这些信息转录到一个表格中,这种格式令人难以置信的繁琐且耗时:

group_id   intercept est  intercept stnd err    intercept t value   …
44375         1.993e-03         1/137e-03           1.753          ...
…

对我来说,使用传统符号而不是科学符号也很方便,但我想我可以没有它。

有没有办法让我这样做而不用手工切割和粘贴?

谢谢--sw

1 个答案:

答案 0 :(得分:2)

摘要函数只返回一个R列表。例如,

R> x = runif(10);y=runif(10)
R> m = lm(y ~ x)

您感兴趣的部分是第四个要素:

R> summary(m)[[4]]
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.44041     0.1768  2.4911  0.03746
x           -0.05899     0.3143 -0.1877  0.85579

这只是一个矩阵。


以上回答了你的问题,但你的代码让我想哭!特别是,阅读for循环和plyr包。例如,我怀疑最后两行几乎可以做你想要的一切:

##Load the package and create some data
library(plyr)
dd = data.frame(group_id = sample(1:3, 10, TRUE), x = runif(10), y=runif(10)) 

##Split up dd by group_id and do some regression
dd1 = ddply(dd, .(group_id), summarise, summary(lm(y ~ x))[[4]])

##Label the column names
colnames(dd1)[2:5] = c("Estimate"   "Std. Error" "t value"    "Pr(>|t|)")