抱歉,如果这个问题已经被提出,但我找不到我需要的东西......
这是我的假设数据库:
x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=c(17, 17, 13.2, NA, 3, 13.2)
x5=c(1,24,5,7,6,8)
db=as.data.frame(cbind(x1, x2, x3, x4, x5))
我尝试了很多不同的东西,但这是基本的想法
dbF=aggregate(db$x5,by=list(db$x1, db$x2, db$x3,db$x4),FUN=sum)
预期的输出是:
x1e=c("A", "B", "C", "C")
x2e=c("L1", "L1", "L1", "L2")
x3e=c("a", "NA", "b", "j")
x4e=c(17, 13.2, NA, 3)
x5e=c(25,13,7,6)
dbExpected=as.data.frame(cbind(x1e, x2e, x3e, x4e, x5e))
我真的需要将NA保留在最终输出中....任何建议? thx提前
答案 0 :(得分:3)
偶然的事情:当你创建你的data.frame(cbind
然后强制)时,你正在制作一个中间字符矩阵,所以当你强制使用data.frame时,一切都是一个因素(不是想要的)因为x5应该是数字的明显原因)。另外,请确保x4变量具有NA级别(此处使用addNA
,因此当您按照它进行聚合时,您可以得到您想要的结果。
x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=addNA(factor(c(17, 17, 13.2, NA, 3, 13.2)))
x5=c(1,24,5,7,6,8)
db=data.frame(x1, x2, x3, x4, x5)
dbF=aggregate(x5 ~ x1+x2+x3+x4, data=db, FUN=sum, na.action=na.pass)
dbF
# x1 x2 x3 x4 x5
# 1 C L2 j 3 6
# 2 B L1 NA 13.2 13
# 3 A L1 a 17 25
# 4 C L1 b <NA> 7
答案 1 :(得分:1)
您可以使用dplyr,并且您的某些功能是多余的。
# install.packages('dplyr') # only run if not installed
library(dplyr)
x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=c(17, 17, 13.2, NA, 3, 13.2)
x5=c(1,24,5,7,6,8)
db=data.frame(x1, x2, x3, x4, x5)
db %>%
group_by(x1, x2, x3, x4) %>%
dplyr::summarise(x5e = sum(x5))
Source: local data frame [4 x 5]
Groups: x1, x2, x3 [?]
x1 x2 x3 x4 x5e
(fctr) (fctr) (fctr) (dbl) (dbl)
1 A L1 a 17.0 25
2 B L1 NA 13.2 13
3 C L1 b NA 7
4 C L2 j 3.0 6