我希望根据mean()
列中的值获取数据集sd()
中不同列的iris
和Species
:
> head(iris[order(runif(nrow(iris))), ])
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
50 5.0 3.3 1.4 0.2 setosa
111 6.5 3.2 5.1 2.0 virginica
69 6.2 2.2 4.5 1.5 versicolor
150 5.9 3.0 5.1 1.8 virginica
如果不区分3种不同的物种,apply
就可以解决问题:
> stats = apply(iris[ ,1:4], MARGIN = 2, function(x) rbind(mean(x), SD = sd(x))); row.names(stats) = c("mean", "sd"); stats
Sepal.Length Sepal.Width Petal.Length Petal.Width
mean 5.8433333 3.0573333 3.758000 1.1993333
sd 0.8280661 0.4358663 1.765298 0.7622377
但是,我怎样才能得到一个列表(?),这些结果按物种划分?
答案 0 :(得分:2)
aggregate
是您正在寻找的功能:
> aggregate(. ~ Species, data = iris, FUN = mean)
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 5.006 3.428 1.462 0.246
2 versicolor 5.936 2.770 4.260 1.326
3 virginica 6.588 2.974 5.552 2.026
> aggregate(. ~ Species, data = iris, FUN = sd)
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 0.3524897 0.3790644 0.1736640 0.1053856
2 versicolor 0.5161711 0.3137983 0.4699110 0.1977527
3 virginica 0.6358796 0.3224966 0.5518947 0.2746501
aggregate
根据因子或因子组合计算数据集的函数。
答案 1 :(得分:1)
您可以使用具有拆分功能的物种拆分数据以获取数据帧列表
iris2 <- split(iris, iris$Species)
fun <- function(df){
stats = apply(df[ ,1:4], MARGIN = 2, function(x) rbind(mean(x), SD = sd(x)))
row.names(stats) = c("mean", "sd")
return(stats)
}
lapply(iris2, fun)
答案 2 :(得分:1)
这不是一个完整的答案(不返回列表并且不保持相同的表结构)。包括对dplyr的认识非常有用summarize_all
library(dplyr)
df <- iris %>% group_by(Species) %>% summarise_all(funs(mean, sd))
# A tibble: 3 × 9
# Species Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean Sepal.Length_sd Sepal.Width_sd
# <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.006 3.428 1.462 0.246 0.3524897 0.3790644
# 2 versicolor 5.936 2.770 4.260 1.326 0.5161711 0.3137983
# 3 virginica 6.588 2.974 5.552 2.026 0.6358796 0.3224966
# ... with 2 more variables: Petal.Length_sd <dbl>, Petal.Width_sd <dbl>
答案 3 :(得分:0)
另一个选项是data.table
library(data.table)
as.data.table(iris)[,unlist(lapply(.SD, function(x)
list(Mean = mean(x), SD = sd(x))), recursive = FALSE), Species]
# Species Sepal.Length.Mean Sepal.Length.SD Sepal.Width.Mean Sepal.Width.SD Petal.Length.Mean Petal.Length.SD Petal.Width.Mean
#1: setosa 5.006 0.3524897 3.428 0.3790644 1.462 0.1736640 0.246
#2: versicolor 5.936 0.5161711 2.770 0.3137983 4.260 0.4699110 1.326
#3: virginica 6.588 0.6358796 2.974 0.3224966 5.552 0.5518947 2.026
# Petal.Width.SD
#1: 0.1053856
#2: 0.1977527
#3: 0.2746501