我在R中运行以下代码:
library("AER")
data(CPS1985,package="AER")
by(CPS1985[c("wage","age","experience")],CPS1985["gender"],mean,na.rm=TRUE)
但每当我这样做时,我总会得到如下错误信息:
by(CPS1985[c("wage","age","experience")],CPS1985["gender"],mean,na.rm=TRUE)
gender: male
[1] NA
gender: female
[1] NA
Warning messages:
1: In mean.default(data[x, , drop = FALSE], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(data[x, , drop = FALSE], ...) :
argument is not numeric or logical: returning NA
在运行代码之前,我还检查了工资,年龄和经验都是数字,性别是一个因子变量。所以我很困惑为什么我收到此错误消息?
谢谢。
答案 0 :(得分:1)
data.table解决方案。
library(data.table)
setDT(CPS1985) ## convert data to data table
CPS1985[, lapply(.SD, mean(na.rm=TRUE)), by=gender, .SDcols=c("wage","age","experience")]
gender wage age experience
1: female 7.878857 37.84082 18.83265
2: male 9.994913 35.97924 16.96540
答案 1 :(得分:0)
当有多个colMeans
by
与column
一起使用
by(CPS1985[, c("wage", "age", "experience")], CPS1985["gender"],
FUN=colMeans, na.rm=TRUE)
#gender: male
# wage age experience
# 9.994913 35.979239 16.965398
# ------------------------------------------------------------
#gender: female
# wage age experience
# 7.878857 37.840816 18.832653
或者您可以使用summarise_each
dplyr
library(dplyr)
CPS1985 %>%
group_by(gender) %>%
summarise_each(funs(mean=mean(., na.rm=TRUE)), wage, age, experience)
# Source: local data frame [2 x 4]
# gender wage age experience
# 1 male 9.994913 35.97924 16.96540
# 2 female 7.878857 37.84082 18.83265