Question

我正在尝试在R（3.0.1）中学习by()。这就是我在做的事。

打开R
attach(iris)
head(iris)
by(iris[,1:4] , Species , mean)

这就是我得到的

> by(iris[,1:4] , Species , mean)

Species: setosa
[1] NA
------------------------------------------------------------ 
Species: versicolor
[1] NA
------------------------------------------------------------ 
Species: virginica
[1] NA
Warning messages:
1: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA

2: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA

3: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA

Answer 1

这里的问题是您正在应用的功能不适用于数据框。实际上你正在调用类似这样的东西

R> mean(iris[iris$Species == "setosa", 1:4])
[1] NA
Warning message:
In mean.default(iris[iris$Species == "setosa", 1:4]) :
  argument is not numeric or logical: returning NA

即。您传递的是4列数据框，其中包含原始行Species == "setosa"。

对于by()，您需要按变量执行此变量，如

R> by(iris[,1] , iris$Species , mean)
iris$Species: setosa
[1] 5.006
------------------------------------------------------------ 
iris$Species: versicolor
[1] 5.936
------------------------------------------------------------ 
iris$Species: virginica
[1] 6.588

或使用colMeans()代替mean()作为FUN已应用

R> by(iris[,1:4] , iris$Species , colMeans)
iris$Species: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.006        3.428        1.462        0.246 
------------------------------------------------------------ 
iris$Species: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.936        2.770        4.260        1.326 
------------------------------------------------------------ 
iris$Species: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       6.588        2.974        5.552        2.026

如果colMeans()之类的预制函数不存在，那么您总是可以写一个包装器到sapply()，例如

foo <- function(x, ...) sapply(x, mean, ...)
by(iris[, 1:4], iris$Species, foo)

R> by(iris[, 1:4], iris$Species, foo)
iris$Species: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.006        3.428        1.462        0.246 
------------------------------------------------------------ 
iris$Species: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.936        2.770        4.260        1.326 
------------------------------------------------------------ 
iris$Species: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       6.588        2.974        5.552        2.026

您可能会发现aggregate()更具吸引力：

R> with(iris, aggregate(iris[,1:4], list(Species = Species), FUN = mean))
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026

注意我如何使用with()直接访问Species;如果您不想通过attaching()进行索引，这比iris iris$Species要好得多。

Answer 2

这是另一个结合了“分裂”和“愤怒”的解决方案。结果是相同但转置。当显示许多变量的统计数据时，这可能是最好的，因为它们是垂直列出的。

sapply（split（iris，iris [，5]），function（x）colMeans（x [，c（1：4）]））

                setosa versicolor virginica
   Sepal.Length  5.006      5.936     6.588
   Sepal.Width   3.428      2.770     2.974
   Petal.Length  1.462      4.260     5.552
   Petal.Width   0.246      1.326     2.026

by（）在数据框上应用平均函数时给出错误。发生了什么？

2 个答案: