family_id<-c(1,2,3)
age_mother<-c(30,27,29)
dob_child1<-c("1998-11-12","1999-12-12","1996-04-12")##child one birth day
dob_child2<-c(NA,"1997-09-09",NA)##if no child,NA
dob_child3<-c(NA,"1999-09-01","1996-09-09")
DT<-data.table(family_id,age_mother,dob_child1,dob_child2,dob_child3)
现在我有了DT,我怎么能用这个表知道每个家庭有多少孩子使用这样的语法:
DT[,apply..,keyby=family_id]##this code is wrong
答案 0 :(得分:0)
您可以使用sqldf
包在SQL
中使用R
查询。
我复制了你的DT。
family_id<-c(1,2,3)
age_mother<-c(30,27,29)
dob_child1<-c("1998-11-12","1999-12-12","1996-04-12")##child one birth day
dob_child2<-c(NA,"1997-09-09",NA)##if no child,NA
dob_child3<-c(NA,"1999-09-01","1996-09-09")
DT<-data.table(family_id,age_mother,dob_child1,dob_child2,dob_child3)
library(sqldf)
sqldf('select distinct (count(dob_child3)+count(dob_child2)+count(dob_child1)) as total_child,
family_id from DT group by family_id')
结果如下:
total_child family_id
1 1 1
2 3 2
3 2 3
对你来说是对的吗?
答案 1 :(得分:0)
这也可能有效:
> DT$total_child <- as.vector(rowSums(!is.na(DT[, c("dob_child1",
"dob_child2", "dob_child3")])))
> DT
family_id age_mother dob_child1 dob_child2 dob_child3 total_child
1 1 30 1998-11-12 <NA> <NA> 1
2 2 27 1999-12-12 1997-09-09 1999-09-01 3
3 3 29 1996-04-12 <NA> 1996-09-09 2