如何仅在R中的apply()函数中将数据集过滤为数值?

时间:2014-01-11 17:24:14

标签: r apply

我试着这样做:

apply(test,2,mean)

我收到了这个警告:

     CS.32   No..of.Takes         CS.130 No..of.Takes.1         CS.131 No..of.Takes.2         CS.133 No..of.Takes.3         CS.135 No..of.Takes.4 
        NA             NA             NA             NA             NA             NA             NA             NA             NA             NA 
Warning messages:
1: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
3: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
4: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
5: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
6: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
7: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
8: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
9: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
10: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA

我想过滤数据集来计算避免使用NA,INC,DRP等非数字值的方法。

3 个答案:

答案 0 :(得分:2)

将您的代码更改为

colMeans(test[,sapply(test, is.numeric)], na.rm=TRUE)

我认为它会起作用。

请注意,colMeans(data.frame/matrix)apply(data.frame/matrix, 2, mean)相同(但更快,更快)。

在我的代码中,test[,sapply(test, is.numeric)]测试特定列是否为数字,如果是,则其colmean通过colMeans计算,否则将被跳过。因此sapply(test, is.numeric)是您正在寻找的“过滤器”,它返回一个布尔向量(TRUE/FALSE),指示哪个列是数字,您可以使用它来对data.frame / matrix进行子集化。

请参阅此示例,考虑iris数据集

> data(iris)
> apply(iris, 2, mean)  # NA's produced as in your case
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
          NA           NA           NA           NA           NA 
Mensajes de aviso perdidos
1: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
...

> apply(iris[, sapply(iris, is.numeric)], 2, mean)  # output is OK
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333 
> colMeans(iris[, sapply(iris, is.numeric)])        # same output
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333 

答案 1 :(得分:0)

添加参数以忽略NA并确保所有列都是数字。您可以使用str(test)

来检查
 apply(test,2,mean,na.rm=TRUE)

答案 2 :(得分:0)

替代方法......一步一步

  • b<-apply(test,2,as.numeric)
  • good=complete.cases(b)
  • c=b[good,]
  • apply(c,2,mean)