Question

我对aaply有疑问。我想检查哪个列is.numeric，但aaply的返回值有点出乎意料。下面是示例代码。为什么我会为所有列获取"data.frame"（这解释了为什么is.numeric即使对于带有数字向量的列也是FALSE？

谢谢！

data=data.frame(str=rep("str",3),num=c(1:3))

is.numeric(data[,1])
# FALSE
is.numeric(data[,2])
# TRUE

aaply(data,2,is.numeric)
# FALSE FALSE

aaply(data,2,class)
# "data.frame" "data.frame"

编辑：在其他情况下会产生警告信息：

aaply(data,2,mean)

# 1: mean(<data.frame>) is deprecated.
#    Use colMeans() or sapply(*, mean) instead.

Answer 1

这是aaply的工作方式，您甚至可以使用identity查看传递给每个函数调用的内容，一个表示data每列的data.frame：

aaply(data, 2, identity)
# $num
#   num
# 1   1
# 2   2
# 3   3
# 
# $str
#   str
# 1 str
# 2 str
# 3 str

因此，按照您希望的方式使用aaply，您必须使用一个提取每个data.frame的第一列的函数，例如：

aaply(data, 2, function(df)is.numeric(df[[1]]))
#   num   str 
#  TRUE FALSE

但似乎更容易做到：

sapply(data, is.numeric)
#   str   num 
# FALSE  TRUE

Answer 2

基本原因是你提供了一个不适合使用的类的参数。 plyr函数的第一个字母表示参数的类型，在本例中为array的“a”。如果你提供一个数组，它确实可以正常工作：

> xx <- plyr::aaply(matrix(1:10, 2), 2, class)
> xx
        1         2         3         4         5 
"integer" "integer" "integer" "integer" "integer"

至少这是我的理解，直到我阅读帮助页面。它表示应该接受数据帧输入，并且数组应该是输出。因此，您发现文档中存在错误或函数中存在错误。无论哪种方式，正确的地方都在'manipulatr' Google-newsgroup上。 @hadley很有可能会把事情搞清楚，因为他在这里也是一个有价值的贡献者。

apply：为什么apply（data，2，class）为所有列返回“data.frame”？

2 个答案: