来自数据帧的R count和substract事件

时间:2016-12-15 11:03:52

标签: r count

我正在尝试从数据框计算家庭规模,该数据框还包含两种类型的事件:死亡的家庭成员和离开家庭的家庭成员。我想考虑这两个参数以计算实际的族大小。 以下是我的问题的生殖示例,仅限3个家庭:

family <- factor(rep(c("001","002","003"), c(10,8,15)), levels=c("001","002","003"), labels=c("001","002","003"), ordered=TRUE)
dead <- c(0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0)
left <- c(0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1,1,1,0,0)
DF <- data.frame(family, dead, left)  ; DF

我可以通过简单地使用table()

在第二个数据帧DF2中计算N =总家庭成员(在每个家庭中)
DF2 <- with(DF, data.frame(table(family)))
colnames(DF2)[2] <- "N"   ; DF2
family  N
1    001 10
2    002  8
3    003 15

但是我无法找到获得实际人数的正确方法(例如,在DF2中创建一个新的变量N2),通过将死亡或离开家庭的成员数减去N来计算。我想我必须以某种方式关联两个数据帧DF和DF2。我在这个网站上寻找其他相关问题,但找不到合适的答案...... 如果有人有个好主意,那就太好了! 先感谢您.. 杰尼

3 个答案:

答案 0 :(得分:2)

逻辑:首先我们要group_by(family),然后计算2个数字:i)每组中的#obs总数ii)从此总数中减去sum(dead) + sum(left)

dplyr包中:n()帮助我们获得每组中的#observations总数

data.table.N执行相同的上述工作

library(dplyr)
DF %>% group_by(family) %>% summarise( total = n(), current = n()-sum(dead,left, na.rm = TRUE))
#  family total current
#  (fctr) (int)   (dbl)
#1    001    10       6
#2    002     8       4
#3    003    15       7


library(data.table)
# setDT() is preferred if incase your data was a data.frame. else just DF.
setDT(DF)[, .(total = .N, current = .N - sum(dead, left, na.rm = TRUE)), by = family]
#   family total current
#1:    001    10       6
#2:    002     8       4
#3:    003    15       7

答案 1 :(得分:2)

这是base R选项

do.call(data.frame, aggregate(dl~family, transform(DF, dl = dead + left), 
      FUN = function(x) c(total=length(x), current=length(x) - sum(x))))

或修改后的版本

transform(aggregate(. ~ family, transform(DF, total = 1,
  current = dead + left)[c(1,4:5)], FUN = sum), current = total - current)
#     family total current
#1    001    10       6
#2    002     8       4
#3    003    15       7

答案 2 :(得分:0)

我终于发现另一个可以正常工作(来自另一个帖子),允许计算原始DF表中的所有内容。这使用ddply函数:

DF <- ddply(DF,.(family),transform,total=length(family)) DF <- ddply(DF,.(family),transform,actual=length(family)-sum(dead=="1")-sum(left=="1")) DF

非常感谢所有帮助过的人!杰尼