Could someone please explain the differences between how apply()
and sapply()
operate on the columns of a data frame?
For example, when attempting to find the class of each column in a data frame, my first inclination is to use apply
on the columns:
> apply(iris, 2, class)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
"character" "character" "character" "character" "character"
This is not correct, however, as some of the columns are numeric:
> class(iris$Petal.Length)
[1] "numeric"
A quick search on Google turned up this solution for the problem which uses sapply
instead of apply
:
> sapply(iris, class)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
"numeric" "numeric" "numeric" "numeric" "factor"
In this case, sapply
is implicitly converting iris
to a list, and then applying the function to each entry in the list, e.g.:
> class(as.list(iris)$Petal.Length)
[1] "numeric"
What I'm still unclear about is why my original attempt using apply
didn't work.
答案 0 :(得分:3)
As often seems to be the case, I figured out the answer to my question in process of writing it up. Posting the answer here in case anyone else has the same question.
Taking a closer look at ?apply
states:
If ‘X’ is not an array but an object of a class with a non-null ‘dim’ value (such as a data frame), ‘apply’ attempts to coerce it to an array via ‘as.matrix’ if it is two-dimensional (e.g., a data frame) or via ‘as.array’.
So just like sapply
casts the data frame to a list
before operating on it, apply
casts the data frame to a matrix
. Since matrices cannot have mixed types and there is at least one column with non-numeric data (Species
), then everything becomes character data:
> class(as.matrix(iris)[,'Petal.Length'])
[1] "character"