R之后或分割文件处理中的错误信息?

时间:2014-10-12 02:33:55

标签: r

我在R中运行以下代码:

library("AER")
data(CPS1985,package="AER")
by(CPS1985[c("wage","age","experience")],CPS1985["gender"],mean,na.rm=TRUE)

但每当我这样做时,我总会得到如下错误信息:

by(CPS1985[c("wage","age","experience")],CPS1985["gender"],mean,na.rm=TRUE)
gender: male
[1] NA
gender: female
[1] NA
Warning messages:
1: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA

在运行代码之前,我还检查了工资,年龄和经验都是数字,性别是一个因子变量。所以我很困惑为什么我收到此错误消息?

谢谢。

2 个答案:

答案 0 :(得分:1)

data.table解决方案。

library(data.table)
setDT(CPS1985)  ## convert data to data table
CPS1985[, lapply(.SD, mean(na.rm=TRUE)), by=gender, .SDcols=c("wage","age","experience")]
   gender     wage      age experience
1: female 7.878857 37.84082   18.83265
2:   male 9.994913 35.97924   16.96540

答案 1 :(得分:0)

当有多个colMeans

时,您需要将bycolumn一起使用
by(CPS1985[, c("wage", "age", "experience")], CPS1985["gender"], 
                                            FUN=colMeans, na.rm=TRUE)
#gender: male
#      wage        age experience 
#  9.994913  35.979239  16.965398 
# ------------------------------------------------------------ 
#gender: female
#     wage        age experience 
#  7.878857  37.840816  18.832653 

或者您可以使用summarise_each

中的dplyr
library(dplyr)
CPS1985 %>% 
        group_by(gender) %>% 
        summarise_each(funs(mean=mean(., na.rm=TRUE)), wage, age, experience)
# Source: local data frame [2 x 4]

#   gender     wage      age experience
# 1   male 9.994913 35.97924   16.96540
# 2 female 7.878857 37.84082   18.83265