R中的聚合函数(处理NA)

时间:2015-12-04 20:48:39

标签: r

抱歉,如果这个问题已经被提出,但我找不到我需要的东西......

这是我的假设数据库:

x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=c(17, 17, 13.2, NA, 3, 13.2)
x5=c(1,24,5,7,6,8)
db=as.data.frame(cbind(x1, x2, x3, x4, x5))

我尝试了很多不同的东西,但这是基本的想法

dbF=aggregate(db$x5,by=list(db$x1, db$x2, db$x3,db$x4),FUN=sum)

预期的输出是:

x1e=c("A", "B", "C", "C")
x2e=c("L1", "L1", "L1", "L2")
x3e=c("a", "NA", "b", "j")                 
x4e=c(17, 13.2, NA, 3)
x5e=c(25,13,7,6)
dbExpected=as.data.frame(cbind(x1e, x2e, x3e, x4e, x5e))

我真的需要将NA保留在最终输出中....任何建议? thx提前

2 个答案:

答案 0 :(得分:3)

偶然的事情:当你创建你的data.frame(cbind然后强制)时,你正在制作一个中间字符矩阵,所以当你强制使用data.frame时,一切都是一个因素(不是想要的)因为x5应该是数字的明显原因)。另外,请确保x4变量具有NA级别(此处使用addNA,因此当您按照它进行聚合时,您可以得到您想要的结果。

x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=addNA(factor(c(17, 17, 13.2, NA, 3, 13.2)))
x5=c(1,24,5,7,6,8)
db=data.frame(x1, x2, x3, x4, x5)

dbF=aggregate(x5 ~ x1+x2+x3+x4, data=db, FUN=sum, na.action=na.pass)
dbF
#  x1 x2 x3   x4 x5
# 1  C L2  j    3  6
# 2  B L1 NA 13.2 13
# 3  A L1  a   17 25
# 4  C L1  b <NA>  7

答案 1 :(得分:1)

您可以使用dplyr,并且您的某些功能是多余的。

# install.packages('dplyr') # only run if not installed
library(dplyr)

x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=c(17, 17, 13.2, NA, 3, 13.2)
x5=c(1,24,5,7,6,8)
db=data.frame(x1, x2, x3, x4, x5)

db %>%
  group_by(x1, x2, x3, x4) %>%
  dplyr::summarise(x5e = sum(x5))

Source: local data frame [4 x 5]
Groups: x1, x2, x3 [?]

      x1     x2     x3    x4   x5e
  (fctr) (fctr) (fctr) (dbl) (dbl)
1      A     L1      a  17.0    25
2      B     L1     NA  13.2    13
3      C     L1      b    NA     7
4      C     L2      j   3.0     6