在使用tapply时,我试图找到摘要结果的解释。在以下示例中,因子“Reg2”的摘要统计信息是错误的。有人可以帮助我们理解这种行为吗?
> edf=data.frame(pri=c(8258, 14253, 11123, 11311),
reg=c("Reg1", "Reg2", "Reg2", "Reg1"))
> tapply(edf$pri, edf$reg, sum)
Reg1 Reg2
19569 25376
> tapply(edf$pri, edf$reg, length)
Reg1 Reg2
2 2
> tapply(edf$pri, edf$reg, mean)
Reg1 Reg2
9784.5 12688.0
> tapply(edf$pri, edf$reg, min)
Reg1 Reg2
8258 11123
> tapply(edf$pri, edf$reg, summary)
$Reg1
Min. 1st Qu. Median Mean 3rd Qu. Max.
8258 9021 9784 9784 10550 11310
$Reg2
Min. 1st Qu. Median Mean 3rd Qu. Max.
11120 11910 12690 12690 13470 14250
> by(edf$pri, edf$reg, summary)
edf$reg: Reg1
Min. 1st Qu. Median Mean 3rd Qu. Max.
8258 9021 9784 9784 10550 11310
edf$reg: Reg2
Min. 1st Qu. Median Mean 3rd Qu. Max.
11120 11910 12690 12690 13470 14250
> do.call("rbind",tapply(edf$pri, edf$reg, summary))
Min. 1st Qu. Median Mean 3rd Qu. Max.
Reg1 8258 9021 9784 9784 10550 11310
Reg2 11120 11910 12690 12690 13470 14250
> str(edf)
'data.frame': 4 obs. of 2 variables:
$ pri: num 8258 14253 11123 11311
$ reg: Factor w/ 2 levels "Reg1","Reg2": 1 2 2 1
答案 0 :(得分:1)
来自?summary
digits: integer, used for number formatting with ‘signif()’ (for
‘summary.default’) or ‘format()’ (for ‘summary.data.frame’).
tapply(edf$pri, edf$reg, summary, digits = 42)
## $Reg1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8258.00 9021.25 9784.50 9784.50 10547.75 11311.00
## $Reg2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11123.0 11905.5 12688.0 12688.0 13470.5 14253.0