我的数据结构如下:
group_id, months_from_start, perc_total_downloads, experience_ratio
1 1 1.2 4
1 2 1.7 6
…
235 1 6.7 3
235 2 18 8
…
大约有300个组,每个组有70个左右的连续数据元素。
我发布了以下脚本来估计每个组的二阶多项式。
s.1<-lm(xts(s[s$group_id == 1,][,-2], order.by=as.Date(s[s$group_id == 1,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1,][,-2], order.by=as.Date(s[s$group_id == 1,][,2]))$months_from_start, 2, raw=TRUE))
s.235<-lm(xts(s[s$group_id == 235,][,-2], order.by=as.Date(s[s$group_id == 235,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 235,][,-2], order.by=as.Date(s[s$group_id == 235,][,2]))$months_from_start, 2, raw=TRUE))
s.599<-lm(xts(s[s$group_id == 599,][,-2], order.by=as.Date(s[s$group_id == 599,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 599,][,-2], order.by=as.Date(s[s$group_id == 599,][,2]))$months_from_start, 2, raw=TRUE))
s.1111<-lm(xts(s[s$group_id == 1111,][,-2], order.by=as.Date(s[s$group_id == 1111,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1111,][,-2], order.by=as.Date(s[s$group_id == 1111,][,2]))$months_from_start, 2, raw=TRUE))
s.1537<-lm(xts(s[s$group_id == 1537,][,-2], order.by=as.Date(s[s$group_id == 1537,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1537,][,-2], order.by=as.Date(s[s$group_id == 1537,][,2]))$months_from_start, 2, raw=TRUE))
对于这些新变量中的每一个,我都可以发布一个摘要声明来揭示有趣的信息:
> summary(s.44375)
Call:
lm(formula = xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id ==
44375, ][, 2]))$perc_total_downloads ~ poly(xts(s[s$group_id ==
44375, ][, -2], order.by = as.Date(s[s$group_id == 44375,
][, 2]))$months_from_start, 2, raw = TRUE))
Residuals:
Min 1Q Median 3Q Max
-0.0064004 -0.0017315 -0.0002022 0.0012087 0.0078436
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.993e-03 1.137e-03 1.753 0.084 .
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)1.0 7.769e-04 6.707e-05 11.583 <2e-16 ***
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)2.0 -9.258e-06 8.404e-07 -11.017 <2e-16 ***
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)0.1 NA NA NA NA
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)1.1 NA NA NA NA
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)0.2 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.002866 on 69 degrees of freedom
Multiple R-squared: 0.6619,Adjusted R-squared: 0.6521
F-statistic: 67.53 on 2 and 69 DF, p-value: < 2.2e-16
出于我的目的,我需要将这些信息转录到一个表格中,这种格式令人难以置信的繁琐且耗时:
group_id intercept est intercept stnd err intercept t value …
44375 1.993e-03 1/137e-03 1.753 ...
…
对我来说,使用传统符号而不是科学符号也很方便,但我想我可以没有它。
有没有办法让我这样做而不用手工切割和粘贴?
谢谢--sw
答案 0 :(得分:2)
摘要函数只返回一个R列表。例如,
R> x = runif(10);y=runif(10)
R> m = lm(y ~ x)
您感兴趣的部分是第四个要素:
R> summary(m)[[4]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.44041 0.1768 2.4911 0.03746
x -0.05899 0.3143 -0.1877 0.85579
这只是一个矩阵。
以上回答了你的问题,但你的代码让我想哭!特别是,阅读for
循环和plyr
包。例如,我怀疑最后两行几乎可以做你想要的一切:
##Load the package and create some data
library(plyr)
dd = data.frame(group_id = sample(1:3, 10, TRUE), x = runif(10), y=runif(10))
##Split up dd by group_id and do some regression
dd1 = ddply(dd, .(group_id), summarise, summary(lm(y ~ x))[[4]])
##Label the column names
colnames(dd1)[2:5] = c("Estimate" "Std. Error" "t value" "Pr(>|t|)")