在Sapp中的Sapply之前的组数据

时间:2017-10-12 13:53:29

标签: r grouping sapply

我正在计算一组多个群体的水质参数统计数据。我想在使用sapply函数之前对数据进行分组。

以下是data.frame示例:

site <- c("Comm HR", "Comm 1", "Trans HR", "Trans 1", "Comm HR", "Comm 1", 
      "Trans HR", "Trans 1")
flow <- c(2,21,3,5,2.1,22,.02,.2)
Pb <- c(200,3,42,3,4.2,55.3, 2,7)
TN <- c(5,22,1,2,4.5,3.4,2,3.2)
s <- data.frame(flow,Pb,TN)

所需的计算统计数据:

stats <- sapply(s, function(s) c("n"=length(s),
                         "Mean"=mean(s,na.rm=TRUE),
                         "Standard Deviation"=sd(s, na.rm=TRUE),
                         "Coefficient of Variation"=sd(s)/mean(s,na.rm=TRUE),
                         "Lower 95% Confidence Limit about Mean"=mean(s,na.rm=TRUE)-(qnorm(0.975)*sd(s, na.rm=T)/sqrt(length(s))),
                         "Upper 95% Confidence Limit about Mean"=mean(s,na.rm=TRUE)+(qnorm(0.975)*sd(s, na.rm=T)/sqrt(length(s))),
                         "Lower Quantile (25th percentile)"=quantile(s,0.25, na.rm=TRUE),
                         "Median"=median(s),
                         "Upper Quantile (75th percentile)"=quantile(s,0.75, na.rm=TRUE),
                         "Inter Quartile Range"=(quantile(s,0.75, na.rm=TRUE)-quantile(s,0.25, na.rm=TRUE)),
                         "Minimum Detected Value"=min(s),
                         "Maximum Detected Value"=max(s))
)

而不是所有网站的统计数据,我希望按网站分组的数据,下面的所需输出,但是在4个不同的网站上(所以这些统计数据是4次):

                                            flow        Pb         TN
n                                      8.0000000  8.000000  8.0000000
Mean                                   6.9150000 39.562500  5.3875000
Standard Deviation                     9.1410581 68.022264  6.8436493
Coefficient of Variation               1.3219173  1.719362  1.2702829
Lower 95% Confidence Limit about Mean  0.5806863 -7.573658  0.6451801
Upper 95% Confidence Limit about Mean 13.2493137 86.698658 10.1298199

1 个答案:

答案 0 :(得分:4)

使用 Site 列作为子集组考虑by。另外,在第一个之后传递到sapply所有列:

s <- data.frame(site, flow, Pb, TN, stringsAsFactors = FALSE)

stats_list <- by(s, s$site, FUN=function(df) {

  sapply(df[2:ncol(df)], function(i)

    c("n"=length(i),
      "Mean"=mean(i,na.rm=TRUE),
      "Standard Deviation"=sd(i, na.rm=TRUE),
      "Coefficient of Variation"=sd(i)/mean(i,na.rm=TRUE),
      "Lower 95% Confidence Limit about Mean"=mean(i,na.rm=TRUE)-(qnorm(0.975)*sd(i, na.rm=T)/sqrt(length(i))),
      "Upper 95% Confidence Limit about Mean"=mean(i,na.rm=TRUE)+(qnorm(0.975)*sd(i, na.rm=T)/sqrt(length(i))),
      "Lower Quantile (25th percentile)"=quantile(i,0.25, na.rm=TRUE),
      "Median"=median(i),
      "Upper Quantile (75th percentile)"=quantile(i,0.75, na.rm=TRUE),
      "Inter Quartile Range"=(quantile(i,0.75, na.rm=TRUE)-quantile(i,0.25, na.rm=TRUE)),
      "Minimum Detected Value"=min(i),
      "Maximum Detected Value"=max(i))
  )

})

输出 (每个网站的指定元素列表)

stats_list

s$site: Comm 1
                                             flow         Pb        TN
n                                      2.00000000   2.000000  2.000000
Mean                                  21.50000000  29.150000 12.700000
Standard Deviation                     0.70710678  36.981685 13.152186
Coefficient of Variation               0.03288869   1.268668  1.035605
Lower 95% Confidence Limit about Mean 20.52001801 -22.103058 -5.527665
Upper 95% Confidence Limit about Mean 22.47998199  80.403058 30.927665
Lower Quantile (25th percentile).25%  21.25000000  16.075000  8.050000
Median                                21.50000000  29.150000 12.700000
Upper Quantile (75th percentile).75%  21.75000000  42.225000 17.350000
Inter Quartile Range.75%               0.50000000  26.150000  9.300000
Minimum Detected Value                21.00000000   3.000000  3.400000
Maximum Detected Value                22.00000000  55.300000 22.000000
----------------------------------------------------------------------------------------- 
s$site: Comm HR
                                            flow         Pb         TN
n                                     2.00000000   2.000000 2.00000000
Mean                                  2.05000000 102.100000 4.75000000
Standard Deviation                    0.07071068 138.451508 0.35355339
Coefficient of Variation              0.03449301   1.356038 0.07443229
Lower 95% Confidence Limit about Mean 1.95200180 -89.780474 4.26000900
Upper 95% Confidence Limit about Mean 2.14799820 293.980474 5.23999100
Lower Quantile (25th percentile).25%  2.02500000  53.150000 4.62500000
Median                                2.05000000 102.100000 4.75000000
Upper Quantile (75th percentile).75%  2.07500000 151.050000 4.87500000
Inter Quartile Range.75%              0.05000000  97.900000 0.25000000
Minimum Detected Value                2.00000000   4.200000 4.50000000
Maximum Detected Value                2.10000000 200.000000 5.00000000
----------------------------------------------------------------------------------------- 
s$site: Trans 1
                                           flow        Pb        TN
n                                      2.000000 2.0000000 2.0000000
Mean                                   2.600000 5.0000000 2.6000000
Standard Deviation                     3.394113 2.8284271 0.8485281
Coefficient of Variation               1.305428 0.5656854 0.3263570
Lower 95% Confidence Limit about Mean -2.103914 1.0800720 1.4240216
Upper 95% Confidence Limit about Mean  7.303914 8.9199280 3.7759784
Lower Quantile (25th percentile).25%   1.400000 4.0000000 2.3000000
Median                                 2.600000 5.0000000 2.6000000
Upper Quantile (75th percentile).75%   3.800000 6.0000000 2.9000000
Inter Quartile Range.75%               2.400000 2.0000000 0.6000000
Minimum Detected Value                 0.200000 3.0000000 2.0000000
Maximum Detected Value                 5.000000 7.0000000 3.2000000
----------------------------------------------------------------------------------------- 
s$site: Trans HR
                                           flow         Pb        TN
n                                      2.000000   2.000000 2.0000000
Mean                                   1.510000  22.000000 1.5000000
Standard Deviation                     2.107178  28.284271 0.7071068
Coefficient of Variation               1.395482   1.285649 0.4714045
Lower 95% Confidence Limit about Mean -1.410346 -17.199280 0.5200180
Upper 95% Confidence Limit about Mean  4.430346  61.199280 2.4799820
Lower Quantile (25th percentile).25%   0.765000  12.000000 1.2500000
Median                                 1.510000  22.000000 1.5000000
Upper Quantile (75th percentile).75%   2.255000  32.000000 1.7500000
Inter Quartile Range.75%               1.490000  20.000000 0.5000000
Minimum Detected Value                 0.020000   2.000000 1.0000000
Maximum Detected Value                 3.000000  42.000000 2.0000000