我正在计算一组多个群体的水质参数统计数据。我想在使用sapply
函数之前对数据进行分组。
以下是data.frame
示例:
site <- c("Comm HR", "Comm 1", "Trans HR", "Trans 1", "Comm HR", "Comm 1",
"Trans HR", "Trans 1")
flow <- c(2,21,3,5,2.1,22,.02,.2)
Pb <- c(200,3,42,3,4.2,55.3, 2,7)
TN <- c(5,22,1,2,4.5,3.4,2,3.2)
s <- data.frame(flow,Pb,TN)
所需的计算统计数据:
stats <- sapply(s, function(s) c("n"=length(s),
"Mean"=mean(s,na.rm=TRUE),
"Standard Deviation"=sd(s, na.rm=TRUE),
"Coefficient of Variation"=sd(s)/mean(s,na.rm=TRUE),
"Lower 95% Confidence Limit about Mean"=mean(s,na.rm=TRUE)-(qnorm(0.975)*sd(s, na.rm=T)/sqrt(length(s))),
"Upper 95% Confidence Limit about Mean"=mean(s,na.rm=TRUE)+(qnorm(0.975)*sd(s, na.rm=T)/sqrt(length(s))),
"Lower Quantile (25th percentile)"=quantile(s,0.25, na.rm=TRUE),
"Median"=median(s),
"Upper Quantile (75th percentile)"=quantile(s,0.75, na.rm=TRUE),
"Inter Quartile Range"=(quantile(s,0.75, na.rm=TRUE)-quantile(s,0.25, na.rm=TRUE)),
"Minimum Detected Value"=min(s),
"Maximum Detected Value"=max(s))
)
而不是所有网站的统计数据,我希望按网站分组的数据,下面的所需输出,但是在4个不同的网站上(所以这些统计数据是4次):
flow Pb TN
n 8.0000000 8.000000 8.0000000
Mean 6.9150000 39.562500 5.3875000
Standard Deviation 9.1410581 68.022264 6.8436493
Coefficient of Variation 1.3219173 1.719362 1.2702829
Lower 95% Confidence Limit about Mean 0.5806863 -7.573658 0.6451801
Upper 95% Confidence Limit about Mean 13.2493137 86.698658 10.1298199
答案 0 :(得分:4)
使用 Site 列作为子集组考虑by
。另外,在第一个之后传递到sapply
所有列:
s <- data.frame(site, flow, Pb, TN, stringsAsFactors = FALSE)
stats_list <- by(s, s$site, FUN=function(df) {
sapply(df[2:ncol(df)], function(i)
c("n"=length(i),
"Mean"=mean(i,na.rm=TRUE),
"Standard Deviation"=sd(i, na.rm=TRUE),
"Coefficient of Variation"=sd(i)/mean(i,na.rm=TRUE),
"Lower 95% Confidence Limit about Mean"=mean(i,na.rm=TRUE)-(qnorm(0.975)*sd(i, na.rm=T)/sqrt(length(i))),
"Upper 95% Confidence Limit about Mean"=mean(i,na.rm=TRUE)+(qnorm(0.975)*sd(i, na.rm=T)/sqrt(length(i))),
"Lower Quantile (25th percentile)"=quantile(i,0.25, na.rm=TRUE),
"Median"=median(i),
"Upper Quantile (75th percentile)"=quantile(i,0.75, na.rm=TRUE),
"Inter Quartile Range"=(quantile(i,0.75, na.rm=TRUE)-quantile(i,0.25, na.rm=TRUE)),
"Minimum Detected Value"=min(i),
"Maximum Detected Value"=max(i))
)
})
输出 (每个网站的指定元素列表)
stats_list
s$site: Comm 1
flow Pb TN
n 2.00000000 2.000000 2.000000
Mean 21.50000000 29.150000 12.700000
Standard Deviation 0.70710678 36.981685 13.152186
Coefficient of Variation 0.03288869 1.268668 1.035605
Lower 95% Confidence Limit about Mean 20.52001801 -22.103058 -5.527665
Upper 95% Confidence Limit about Mean 22.47998199 80.403058 30.927665
Lower Quantile (25th percentile).25% 21.25000000 16.075000 8.050000
Median 21.50000000 29.150000 12.700000
Upper Quantile (75th percentile).75% 21.75000000 42.225000 17.350000
Inter Quartile Range.75% 0.50000000 26.150000 9.300000
Minimum Detected Value 21.00000000 3.000000 3.400000
Maximum Detected Value 22.00000000 55.300000 22.000000
-----------------------------------------------------------------------------------------
s$site: Comm HR
flow Pb TN
n 2.00000000 2.000000 2.00000000
Mean 2.05000000 102.100000 4.75000000
Standard Deviation 0.07071068 138.451508 0.35355339
Coefficient of Variation 0.03449301 1.356038 0.07443229
Lower 95% Confidence Limit about Mean 1.95200180 -89.780474 4.26000900
Upper 95% Confidence Limit about Mean 2.14799820 293.980474 5.23999100
Lower Quantile (25th percentile).25% 2.02500000 53.150000 4.62500000
Median 2.05000000 102.100000 4.75000000
Upper Quantile (75th percentile).75% 2.07500000 151.050000 4.87500000
Inter Quartile Range.75% 0.05000000 97.900000 0.25000000
Minimum Detected Value 2.00000000 4.200000 4.50000000
Maximum Detected Value 2.10000000 200.000000 5.00000000
-----------------------------------------------------------------------------------------
s$site: Trans 1
flow Pb TN
n 2.000000 2.0000000 2.0000000
Mean 2.600000 5.0000000 2.6000000
Standard Deviation 3.394113 2.8284271 0.8485281
Coefficient of Variation 1.305428 0.5656854 0.3263570
Lower 95% Confidence Limit about Mean -2.103914 1.0800720 1.4240216
Upper 95% Confidence Limit about Mean 7.303914 8.9199280 3.7759784
Lower Quantile (25th percentile).25% 1.400000 4.0000000 2.3000000
Median 2.600000 5.0000000 2.6000000
Upper Quantile (75th percentile).75% 3.800000 6.0000000 2.9000000
Inter Quartile Range.75% 2.400000 2.0000000 0.6000000
Minimum Detected Value 0.200000 3.0000000 2.0000000
Maximum Detected Value 5.000000 7.0000000 3.2000000
-----------------------------------------------------------------------------------------
s$site: Trans HR
flow Pb TN
n 2.000000 2.000000 2.0000000
Mean 1.510000 22.000000 1.5000000
Standard Deviation 2.107178 28.284271 0.7071068
Coefficient of Variation 1.395482 1.285649 0.4714045
Lower 95% Confidence Limit about Mean -1.410346 -17.199280 0.5200180
Upper 95% Confidence Limit about Mean 4.430346 61.199280 2.4799820
Lower Quantile (25th percentile).25% 0.765000 12.000000 1.2500000
Median 1.510000 22.000000 1.5000000
Upper Quantile (75th percentile).75% 2.255000 32.000000 1.7500000
Inter Quartile Range.75% 1.490000 20.000000 0.5000000
Minimum Detected Value 0.020000 2.000000 1.0000000
Maximum Detected Value 3.000000 42.000000 2.0000000