Question

我希望根据mean()列中的值获取数据集sd()中不同列的iris和Species：

> head(iris[order(runif(nrow(iris))), ])
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
50           5.0         3.3          1.4         0.2     setosa
111          6.5         3.2          5.1         2.0  virginica
69           6.2         2.2          4.5         1.5 versicolor
150          5.9         3.0          5.1         1.8  virginica

如果不区分3种不同的物种，apply就可以解决问题：

> stats = apply(iris[ ,1:4], MARGIN = 2, function(x) rbind(mean(x), SD = sd(x))); row.names(stats) = c("mean", "sd"); stats
     Sepal.Length Sepal.Width Petal.Length Petal.Width
mean    5.8433333   3.0573333     3.758000   1.1993333
sd      0.8280661   0.4358663     1.765298   0.7622377

但是，我怎样才能得到一个列表（？），这些结果按物种划分？

Answer 1

aggregate是您正在寻找的功能：

> aggregate(. ~ Species, data = iris, FUN = mean)
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026
> aggregate(. ~ Species, data = iris, FUN = sd)
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa    0.3524897   0.3790644    0.1736640   0.1053856
2 versicolor    0.5161711   0.3137983    0.4699110   0.1977527
3  virginica    0.6358796   0.3224966    0.5518947   0.2746501

aggregate根据因子或因子组合计算数据集的函数。

Answer 2

您可以使用具有拆分功能的物种拆分数据以获取数据帧列表

iris2 <- split(iris, iris$Species)
fun <- function(df){
stats = apply(df[ ,1:4], MARGIN = 2, function(x) rbind(mean(x), SD = sd(x)))
row.names(stats) = c("mean", "sd") 
return(stats)
}
lapply(iris2, fun)

Answer 3

这不是一个完整的答案（不返回列表并且不保持相同的表结构）。包括对dplyr的认识非常有用summarize_all

library(dplyr)
df <- iris %>% group_by(Species) %>% summarise_all(funs(mean, sd)) 

# A tibble: 3 × 9
# Species Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean Sepal.Length_sd Sepal.Width_sd
# <fctr>             <dbl>            <dbl>             <dbl>            <dbl>           <dbl>          <dbl>
# 1     setosa             5.006            3.428             1.462            0.246       0.3524897      0.3790644
# 2 versicolor             5.936            2.770             4.260            1.326       0.5161711      0.3137983
# 3  virginica             6.588            2.974             5.552            2.026       0.6358796      0.3224966
# ... with 2 more variables: Petal.Length_sd <dbl>, Petal.Width_sd <dbl>

Answer 4

另一个选项是data.table

library(data.table)
as.data.table(iris)[,unlist(lapply(.SD, function(x)
    list(Mean = mean(x), SD = sd(x))), recursive = FALSE), Species]
#     Species Sepal.Length.Mean Sepal.Length.SD Sepal.Width.Mean Sepal.Width.SD Petal.Length.Mean Petal.Length.SD Petal.Width.Mean
#1:     setosa             5.006       0.3524897            3.428      0.3790644             1.462       0.1736640            0.246
#2: versicolor             5.936       0.5161711            2.770      0.3137983             4.260       0.4699110            1.326
#3:  virginica             6.588       0.6358796            2.974      0.3224966             5.552       0.5518947            2.026
#   Petal.Width.SD
#1:      0.1053856
#2:      0.1977527
#3:      0.2746501

如何获取按R

4 个答案: